docs(v2): refine batch import architecture
Expand the batch auto-import V2 spec and TDD plan with stability requirements, result state persistence, and result page design. Add a dedicated architecture document for run state, APIs, pages, and UI field layout, and sync the execution board to the new V2 scope.
This commit is contained in:
@@ -1,17 +1,21 @@
|
||||
# SPEC: Batch Auto-Import by URL + Key (v2)
|
||||
|
||||
日期:2026-05-21
|
||||
技术架构:`docs/2026-05-22-BATCH_AUTO_IMPORT_V2_ARCHITECTURE.md`
|
||||
|
||||
## 1. Objective
|
||||
|
||||
让管理员只提供一批 `(base_url, api_key)` 对,就能自动完成:
|
||||
|
||||
1. **上游探测** — 调用 `GET {base_url}/v1/models` 动态获取该 key 支持的模型列表
|
||||
2. **宿主演化** — 将发现的模型与宿主 channel 配置对比,自动扩展 `model_mapping`
|
||||
3. **供应商注册** — 把 URL+key 注册为可控可管的 provider
|
||||
4. **中转闭环验证** — 用该 key 跑一次 `/v1/chat/completions` 确认真实可用
|
||||
1. **上游发现** — 调用 `GET {base_url}/v1/models` 与最小 smoke 请求,动态获取该 key 真正支持的模型列表
|
||||
2. **名称纠错** — 自动把“人工填错的模型名”与上游真实返回做比对、归一化、纠偏
|
||||
3. **能力画像** — 记录这个上游/模型对 OpenAI/Anthropic 兼容能力、Responses 支持、stream/tool 调用等差异
|
||||
4. **宿主演化** — 将发现结果与宿主 channel / account 配置对比,自动扩展 `model_mapping`
|
||||
5. **异步确认** — 对“建账号成功但宿主异步 probe / 调度尚未稳定”的场景做延迟确认,不把瞬时失败立即记成最终失败
|
||||
6. **中转闭环验证** — 用托管 key 跑真实 `/v1/chat/completions` 验证,确认最终 `active/degraded/broken`
|
||||
7. **状态可观测** — 持久化每个 run、item、模型、账号、provider 的阶段结果,并提供页面查看导入状态
|
||||
|
||||
全程**无需预置 provider manifest**,不依赖 pack,零人工判断。
|
||||
目标不是“绝对零人工”,而是把人工输入压缩到最小,并把容易写错、容易误判的部分交给系统自动确认。
|
||||
|
||||
## 2. 为什么现在需要这个
|
||||
|
||||
@@ -20,57 +24,121 @@
|
||||
- **新 key 无法即插即用**:每次接一个陌生 provider URL,都得先查文档再写 manifest
|
||||
- **模型列表人工维护**:provider 上游升级模型,pack 里不会自动同步
|
||||
- **调试链路长**:假设备注 manifest → 导入 → 发现 channel 缺少模型 → 手动补 → 重新导入
|
||||
- **模型名容易写错**:例如 `minimax-m27-highspeed` 与 `MiniMax-M2.7-highspeed`,人工输入极易出错
|
||||
- **国产模型兼容差异大**:很多“OpenAI-compatible”只兼容 `/chat/completions`,不兼容 `/responses`、`tools`、`stream_options`
|
||||
- **宿主存在异步窗口**:账号创建、Responses probe、调度预热、账号可选状态更新并非原子完成,一次即时检查容易得到假阴性
|
||||
- **长任务稳定性不足**:批量导入跨多个阶段,若没有状态持久化、重试边界和结果投影,失败后很难判断卡在哪一步
|
||||
- **结果不可视**:当前主要靠 CLI、日志和 artifact 复盘,缺少专门页面查看导入状态和账号/模型明细
|
||||
|
||||
v2 把"探测 → 配置 → 注册 → 验证"压缩成**一键闭环**。
|
||||
v2 需要把“探测 → 配置 → 注册 → 异步确认 → 验证”压缩成**一键闭环**。
|
||||
|
||||
## 3. 核心用户故事
|
||||
|
||||
> 作为管理员,我有了一批新的中转 key(URL + token),我想在已经运行的宿主上快速开通这些模型。理想情况是我把这批 key 列出来,系统自动探测每个 key 支持什么模型、自动配置宿主 channel、自动注册为可控 provider、自动跑一遍真实 completion 测试,最后告诉我哪些真正可用。
|
||||
> 作为管理员,我有了一批新的中转 key(URL + token),我想在已经运行的宿主上快速开通这些模型。理想情况是我把这批 key 列出来,系统自动探测每个 key 支持什么模型、自动纠正模型名、自动识别兼容能力、自动配置宿主 channel、自动注册为可控 provider、自动异步确认账号和闭环状态,并在控制面页面里直接告诉我哪些真正可用、哪些只是暂时不稳定、哪些需要特定兼容策略。
|
||||
|
||||
## 4. 技术方案
|
||||
|
||||
### 4.1 三阶段管道
|
||||
### 4.1 四阶段管道 + 运行态持久化
|
||||
|
||||
```
|
||||
输入: [(base_url, api_key), ...]
|
||||
|
||||
Stage 0: Run Setup ──────────────────────────────────────────────
|
||||
create import_run
|
||||
→ persist operator input / retry policy / timestamps
|
||||
→ assign run_id and item_ids
|
||||
|
||||
Stage 1: Probe ─────────────────────────────────────────────────
|
||||
for each (url, key):
|
||||
upstream_models = GET {url}/v1/models
|
||||
→ extract model list
|
||||
upstream_capabilities = probe endpoint compatibility
|
||||
→ /models | /chat/completions | /responses | /messages
|
||||
upstream_completion = POST {url}/v1/chat/completions (smoke)
|
||||
→ HTTP status, latency, error_type
|
||||
→ HTTP status, latency, error_type, usable_model
|
||||
classify: models_ok | models_fail | completion_fail | unreachable
|
||||
normalize model ids and select smoke model automatically
|
||||
|
||||
Stage 2: Provision ──────────────────────────────────────────────
|
||||
for each (url, key) where upstream_models != models_fail:
|
||||
host_channel = find_or_create_channel(provider_id, url)
|
||||
missing_models = upstream_models - host_channel.model_mapping.keys
|
||||
host_channel = find_or_create_channel(provider_id, url, capability_profile)
|
||||
missing_models = normalized_models - host_channel.model_mapping.keys
|
||||
if missing_models:
|
||||
patch_channel(host_channel, add model_mapping entries)
|
||||
managed_account = create_or_update_account(url, key)
|
||||
probe_result = account_test(managed_account, smoke_test_model)
|
||||
register_provider_binding(provider_id, url, key, upstream_models)
|
||||
managed_account = create_or_update_account(url, key, normalized_models)
|
||||
register_provider_binding(provider_id, url, key, normalized_models, capability_profile)
|
||||
|
||||
Stage 3: Validate ───────────────────────────────────────────────
|
||||
for each registered (url, key):
|
||||
Stage 3: Async Confirm ──────────────────────────────────────────
|
||||
for each registered account:
|
||||
async account confirm:
|
||||
re-check account models
|
||||
re-check account test (after host async probe settles)
|
||||
re-check temporary 503/no available accounts windows
|
||||
→ write confirmation_status: pending | confirmed | warning | failed
|
||||
|
||||
Stage 4: Validate ───────────────────────────────────────────────
|
||||
for each confirmed account:
|
||||
final_completion = POST host_gw/v1/chat/completions
|
||||
via managed_account key
|
||||
→ write access_status: active | broken | degraded
|
||||
persist final run summary and UI-facing status projections
|
||||
output: per-url status + summary
|
||||
|
||||
输出: BatchImportResult {
|
||||
run_id: string
|
||||
total: int
|
||||
active: int
|
||||
broken: int
|
||||
degraded: int
|
||||
details: [{url, upstream_models, channel_config, access_status, error}]
|
||||
details: [{url, normalized_models, capability_profile, confirmation_status, access_status, error}]
|
||||
}
|
||||
```
|
||||
|
||||
### 4.1.1 为什么必须引入异步确认
|
||||
|
||||
真实验收已经证明,“账号创建完成”不等于“立即可验证成功”:
|
||||
|
||||
1. 宿主对第三方 OpenAI 兼容上游的 `/responses` 能力探测是异步落库的
|
||||
2. 账号刚创建后,第一次 `/accounts/:id/test` 可能仍走旧路径,返回临时 `403 Forbidden`
|
||||
3. channel / group / subscription 已经写好后,第一次 `/v1/chat/completions` 也可能短暂命中 `503 no available accounts`
|
||||
4. 几百毫秒到几秒后,同一条链路又会恢复为 `200`
|
||||
|
||||
因此 v2 不能继续用“创建后立刻同步 test 一次”的策略直接定生死。必须区分:
|
||||
|
||||
- **提交成功**
|
||||
- **异步确认中**
|
||||
- **最终确认成功/失败**
|
||||
|
||||
### 4.1.2 状态机
|
||||
|
||||
每个导入条目应至少具备以下状态机:
|
||||
|
||||
```
|
||||
discovered
|
||||
→ provisioned
|
||||
→ confirming
|
||||
→ confirmed_active
|
||||
→ confirmed_warning
|
||||
→ confirmed_broken
|
||||
```
|
||||
|
||||
其中:
|
||||
|
||||
- `provisioned`:宿主资源已创建,但不能对外宣称 ready
|
||||
- `confirming`:正在等待宿主异步 probe / account warm-up / gateway 调度稳定
|
||||
- `confirmed_warning`:链路可用,但有 advisory 风险,例如 probe 403 race、兼容能力受限
|
||||
- `confirmed_broken`:经过重试与延迟确认后仍不可用
|
||||
|
||||
每个状态转换都必须持久化,不能只留在内存中。控制面至少要能恢复:
|
||||
|
||||
- 当前 run 进行到哪个阶段
|
||||
- 哪些 item 已完成
|
||||
- 哪些 item 仍在 confirming
|
||||
- 哪些 item 因 transient 错误进入下一次 retry
|
||||
|
||||
### 4.2 关键设计决策
|
||||
|
||||
#### Q1: 如何从 `/v1/models` 提取模型列表?
|
||||
#### Q1: 如何从 `/v1/models` 提取并纠正模型列表?
|
||||
|
||||
OpenAI-compatible 上游返回格式为:
|
||||
```json
|
||||
@@ -80,9 +148,27 @@ OpenAI-compatible 上游返回格式为:
|
||||
```
|
||||
|
||||
提取策略:
|
||||
- 取 `data[].id` 作为模型名
|
||||
- 过滤掉以 `gpt-` / `claude-` / `text-` / `embedding-` 开头的明显非目标模型
|
||||
- 保留其余作为"发现的模型列表"
|
||||
- 取 `data[].id` 作为上游原始模型名
|
||||
- 保留 `raw_model_id`
|
||||
- 同时生成 `normalized_model_id`
|
||||
- 默认不过滤“看起来像 GPT”的名字,而是把原始值完整记录下来,再根据 provider host / capability profile 判断是否属于目标模型
|
||||
|
||||
归一化规则至少覆盖:
|
||||
- 大小写归一
|
||||
- 连字符 / 点号差异
|
||||
- `vendor/model` 前缀剥离
|
||||
- 常见别名映射
|
||||
|
||||
示例:
|
||||
|
||||
| raw | normalized |
|
||||
|---|---|
|
||||
| `MiniMax-M2.7-highspeed` | `minimax-m2.7-highspeed` |
|
||||
| `minimax-m27-highspeed` | `minimax-m27-highspeed` |
|
||||
| `deepseek-ai/DeepSeek-V4-Pro` | `deepseek-v4-pro` |
|
||||
| `Kimi-K2.6` | `kimi-k2.6` |
|
||||
|
||||
系统不应再默认信任人工填入的模型名,而应优先信任 key 实探结果。
|
||||
|
||||
#### Q2: 如何把上游模型写入宿主 channel?
|
||||
|
||||
@@ -91,7 +177,15 @@ OpenAI-compatible 上游返回格式为:
|
||||
- `restrict_models: bool` — true 时 gateway 只路由 mapping 内的模型
|
||||
|
||||
策略:
|
||||
- `model_mapping[key] = key`(一对一映射,上游模型名即 gateway 模型名)
|
||||
- channel 中同时保留:
|
||||
- `raw_model_id`
|
||||
- `normalized_model_id`
|
||||
- 最终对外 gateway model 名
|
||||
- 默认行为是:
|
||||
- `gateway_model = normalized_model_id`
|
||||
- `upstream_model = raw_model_id`
|
||||
- 若宿主侧必须保持原名路由,则至少要把 alias 关系落到 profile,后续导入与对账都按 normalized 视角比较
|
||||
|
||||
- `model_pricing` 填默认值(`price_per_1m=0`, `max_batch=0`),不阻塞导入
|
||||
- 如果 channel 不存在,创建新 channel(`name = host_registered_{provider_id}`)
|
||||
|
||||
@@ -122,13 +216,45 @@ Stage 1 的 smoke test 需要区分错误类型:
|
||||
|
||||
Stage 3 的 host relay smoke 测试结果才决定最终 `access_status`。
|
||||
|
||||
#### Q6: 如何记录兼容能力,避免每次重新踩坑?
|
||||
|
||||
v2 必须引入 `capability_profile` 概念。至少记录:
|
||||
|
||||
```json
|
||||
{
|
||||
"supports_openai_models": true,
|
||||
"supports_openai_chat_completions": true,
|
||||
"supports_openai_responses": false,
|
||||
"supports_anthropic_messages": false,
|
||||
"supports_stream": true,
|
||||
"supports_tools": "unknown",
|
||||
"supports_reasoning_fields": "unknown",
|
||||
"auth_style": "bearer",
|
||||
"model_id_style": "vendor_prefixed | canonical | mixed",
|
||||
"known_advisories": [
|
||||
"responses_403_third_party",
|
||||
"initial_account_probe_race",
|
||||
"gateway_no_available_accounts_warmup"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
这个 profile 的用途不是“好看”,而是后续快速匹配策略:
|
||||
|
||||
- 哪些 provider 需要跳过 `/responses`
|
||||
- 哪些 provider 要优先走 raw `/chat/completions`
|
||||
- 哪些 provider 要启用 completion retry
|
||||
- 哪些 provider 的模型名要先归一化再对比
|
||||
- 哪些 provider 需要 Anthropic 兼容入口
|
||||
|
||||
### 4.3 数据流
|
||||
|
||||
```
|
||||
BatchImportRequest
|
||||
├── base_url: string
|
||||
├── api_key: string
|
||||
└── access_mode: "subscription" | "self_service" (可选,默认 subscription)
|
||||
├── access_mode: "subscription" | "self_service" (可选,默认 subscription)
|
||||
└── requested_models: []string (可选,作为提示而不是信任源)
|
||||
|
||||
BatchImportResult
|
||||
├── batch_id: string
|
||||
@@ -141,14 +267,59 @@ BatchImportResult
|
||||
ImportItemResult
|
||||
├── base_url: string
|
||||
├── provider_id: string (自动生成)
|
||||
├── upstream_models: []string (Stage 1 发现)
|
||||
├── upstream_models: []string (Stage 1 发现的原始模型)
|
||||
├── normalized_models: []string (归一化后的模型)
|
||||
├── resolved_smoke_model: string
|
||||
├── capability_profile: object
|
||||
├── channel_id: int64 (Stage 2 创建/更新)
|
||||
├── account_id: int64 (Stage 2 创建/更新)
|
||||
├── probe_ok: bool (Stage 2 account test)
|
||||
├── probe_ok: bool (Stage 3 account test 最终结果)
|
||||
├── confirmation_status: string
|
||||
├── access_status: string (Stage 3 最终)
|
||||
├── stage_status: string (discovered | provisioned | confirming | confirmed_*)
|
||||
├── advisory_messages: []string
|
||||
├── retry_count: int
|
||||
├── last_error_stage: string | null
|
||||
└── error: string | null
|
||||
```
|
||||
|
||||
新增运行态持久化对象:
|
||||
|
||||
```text
|
||||
ImportRun
|
||||
- run_id
|
||||
- mode
|
||||
- access_mode
|
||||
- total_items
|
||||
- completed_items
|
||||
- active_items
|
||||
- degraded_items
|
||||
- broken_items
|
||||
- state (running | completed | completed_with_warnings | failed | cancelled)
|
||||
- started_at
|
||||
- updated_at
|
||||
- finished_at
|
||||
|
||||
ImportRunItem
|
||||
- run_id
|
||||
- item_id
|
||||
- base_url
|
||||
- provider_id
|
||||
- current_stage
|
||||
- stage_status
|
||||
- requested_models
|
||||
- normalized_models
|
||||
- resolved_smoke_model
|
||||
- channel_id
|
||||
- account_id
|
||||
- confirmation_status
|
||||
- access_status
|
||||
- retry_count
|
||||
- advisory_messages
|
||||
- last_error_stage
|
||||
- last_error
|
||||
```
|
||||
|
||||
### 4.4 CLI 接口
|
||||
|
||||
```bash
|
||||
@@ -179,6 +350,65 @@ https://api.deepseek.com,sk-xxx
|
||||
https://api.completion.com,sk-yyy
|
||||
```
|
||||
|
||||
CLI 输出必须引用 `run_id`,并能直接打印结果页入口:
|
||||
|
||||
```text
|
||||
run_id: batch-20260522-001
|
||||
result_page: /batch-import/runs/batch-20260522-001
|
||||
```
|
||||
|
||||
### 4.5 结果查看 API 与页面
|
||||
|
||||
v2 不再只提供 CLI 输出,必须提供最小可用的控制面结果查看能力。
|
||||
|
||||
#### HTTP API
|
||||
|
||||
```text
|
||||
GET /api/batch-import/runs
|
||||
GET /api/batch-import/runs/{run_id}
|
||||
GET /api/batch-import/runs/{run_id}/items
|
||||
GET /api/batch-import/runs/{run_id}/items/{item_id}
|
||||
```
|
||||
|
||||
用途:
|
||||
|
||||
- 列出最近批次
|
||||
- 查看某个批次的整体统计
|
||||
- 查看每条 URL / provider / account 的阶段结果
|
||||
- 查看模型纠错、capability profile、advisory、retry 轨迹
|
||||
|
||||
#### 页面
|
||||
|
||||
至少提供一个简单结果页:
|
||||
|
||||
```text
|
||||
/batch-import/runs
|
||||
/batch-import/runs/{run_id}
|
||||
```
|
||||
|
||||
页面最低要求:
|
||||
|
||||
- 批次列表页:
|
||||
- run_id
|
||||
- started_at / finished_at
|
||||
- total / active / degraded / broken
|
||||
- overall state
|
||||
- 批次详情页:
|
||||
- 每个 item 的 base_url / provider_id
|
||||
- requested_models / normalized_models / resolved_smoke_model
|
||||
- capability_profile 摘要
|
||||
- channel_id / account_id
|
||||
- confirmation_status / access_status
|
||||
- advisory_messages
|
||||
- last_error_stage / last_error
|
||||
|
||||
页面目标不是做复杂前端,而是让运营和开发能快速回答:
|
||||
|
||||
- 哪条导入卡住了
|
||||
- 卡在哪一阶段
|
||||
- 是模型名错、兼容不支持、probe race,还是 completion 失败
|
||||
- 这个 warning 是暂时性的还是最终要人工处理的
|
||||
|
||||
## 5. 宿主硬约束(继承自 v1)
|
||||
|
||||
- 不修改宿主源码
|
||||
@@ -186,6 +416,7 @@ https://api.completion.com,sk-yyy
|
||||
- 只通过宿主 HTTP Admin API 和 Gateway API 工作
|
||||
- channel 完整收口字段必须同时存在:`model_mapping` + `model_pricing` + `restrict_models=true` + `billing_model_source=channel_mapped`
|
||||
- `/v1/models` 和 `/v1/chat/completions` 是两个独立验收层
|
||||
- 结果页与运行状态只能读取控制面自己的状态库,不读取宿主数据库
|
||||
|
||||
## 6. 访问闭环
|
||||
|
||||
@@ -197,15 +428,40 @@ Stage 3 的 `access_status` 决定真实可用性:
|
||||
| `degraded` | Stage1/2 OK,但 Stage3 completion 异常 | ⚠️ 限流/不稳定 |
|
||||
| `broken` | Stage1 probe 失败或 Stage2 account test 失败 | ❌ |
|
||||
|
||||
补充约束:
|
||||
|
||||
- `requested_models` 只是提示,不是验收依据
|
||||
- 只有 `resolved_smoke_model` 经上游实探成功,才能作为最终 smoke 模型
|
||||
- 对于第三方 upstream 的首次 `403 Forbidden` account probe,若 `/models` 已命中且 capability profile 已识别为 `responses_unsupported`,应先进入 `warning/confirming`,而不是立即 `broken`
|
||||
- 对于导入后瞬时 `503 no available accounts`,应先进入短暂 retry 窗口,而不是立即最终失败
|
||||
|
||||
## 7. 错误恢复策略
|
||||
|
||||
- Stage 1 失败:记录 `upstream_unreachable`,跳过 Stage 2/3
|
||||
- Stage 2 部分失败:已完成资源保留(不自动回滚)
|
||||
- Stage 3 失败:access_status 降级,但已创建资源不删除
|
||||
- Stage 3 首次失败:进入 `confirming`,按 capability profile 与 transient 分类决定是否重试
|
||||
- Stage 4 最终失败:access_status 降级,但已创建资源不删除
|
||||
- 整批中断:按 `--mode strict | partial` 处理
|
||||
- `strict`:任一 item 失败,整批停止,报告已完成的
|
||||
- `partial`(默认):失败 item 单独记录,成功的继续
|
||||
|
||||
需要新增两类恢复策略:
|
||||
|
||||
1. **模型名纠错恢复**
|
||||
- 若请求方显式填写了模型名,但 upstream `/models` 未返回该模型
|
||||
- 系统应尝试 normalized 比对和 alias 命中
|
||||
- 若仍未命中,则返回“推荐模型名”,不要盲目创建错误配置
|
||||
|
||||
2. **兼容能力恢复**
|
||||
- 若 `/responses` 失败但 `/chat/completions` 成功
|
||||
- profile 应明确标记 `supports_openai_responses=false`
|
||||
- 后续同类 provider 默认直接跳过 responses 探测
|
||||
|
||||
3. **运行态稳定性恢复**
|
||||
- item 的阶段结果、retry_count、last_error_stage 必须持久化
|
||||
- 控制面重启后,历史 run 结果仍应可查看
|
||||
- 若未来支持 resume,必须显式区分 resumed run 与原始 run
|
||||
|
||||
## 8. 与 v1 的关系
|
||||
|
||||
v2 **不取代** v1,而是新增一条并行入口:
|
||||
@@ -225,14 +481,23 @@ v2 的 provider binding 复用 v1 已有 `managed_resources` 和 `import_batches
|
||||
internal/
|
||||
probe/ # 新增:上游探测模块
|
||||
models.go # GET /v1/models 解析
|
||||
aliases.go # 模型名归一化 / 别名比对
|
||||
completion.go # smoke test POST /v1/chat/completions
|
||||
capability.go # /responses / /messages / stream / tools 能力探测
|
||||
classifier.go # 错误分类(auth/rate_limit/upstream/unreachable)
|
||||
batch/ # 新增:批量导入编排
|
||||
service.go # BatchImportService: 管道编排
|
||||
provider_id.go # URL → provider_id 规范化
|
||||
channel_evolution.go # model_mapping 扩展逻辑
|
||||
confirmation.go # 异步确认状态机 / retry policy
|
||||
capability_profile.go # provider/model 兼容能力画像持久化与决策
|
||||
run_state.go # import run / item 持久化模型
|
||||
status_projection.go # 列表页 / 详情页统计投影
|
||||
host/sub2api/
|
||||
channel.go # 新增: PatchChannel(channel_id, add_model_mapping)
|
||||
app/
|
||||
http_batch_import.go # 批量导入 API
|
||||
http_batch_runs.go # run 列表 / 详情 API 与页面
|
||||
cmd/
|
||||
cli/
|
||||
batch_import.go # 新增: batch-import 命令
|
||||
@@ -244,20 +509,31 @@ tests/integration/
|
||||
|
||||
### 单测
|
||||
- `probe/models_test.go` — 模型列表解析,覆盖 OpenAI 格式变体
|
||||
- `probe/aliases_test.go` — 模型名归一化、前缀剥离、常见拼写误差提示
|
||||
- `probe/capability_test.go` — OpenAI/Anthropic/Responses 兼容能力探测
|
||||
- `probe/classifier_test.go` — 错误类型分类
|
||||
- `batch/provider_id_test.go` — URL → provider_id 规范化
|
||||
- `batch/channel_evolution_test.go` — model_mapping 扩展差异计算
|
||||
- `batch/confirmation_test.go` — 异步确认窗口、短暂 503 retry、advisory 降级
|
||||
- `batch/capability_profile_test.go` — compatibility → routing strategy 决策
|
||||
- `batch/run_state_test.go` — run/item 状态持久化与状态投影
|
||||
- `batch/service_test.go` — 管道编排 mock 测试
|
||||
- `app/http_batch_import_test.go` — 结果 API / 页面输出
|
||||
|
||||
### 集成测
|
||||
- `tests/integration/batch_import_test.go`
|
||||
- 两组 (url, key),probe + provision + validate 全流程
|
||||
- strict 模式任一失败整批停止
|
||||
- partial 模式失败 item 隔离
|
||||
- 第一次 account test `403 Forbidden`,异步确认后转 warning/active
|
||||
- 第一次 completion `503 no available accounts`,重试后转 active
|
||||
- `requested_models` 填错时,能给出 `normalized_models/recommended_model`
|
||||
- 导入过程中查询 run detail,能看到阶段推进和 retry_count 变化
|
||||
- 导入完成后页面/API 可查看 run summary 和 item 详情
|
||||
|
||||
## 11. 暂不做(v2 范围外)
|
||||
|
||||
- Web UI / HTTP API 入口(CLI 先跑通)
|
||||
- 自动生成价格策略(先记录默认值和未确认状态)
|
||||
- 自动发现 provider 的 channel pricing(model pricing 留空,等用户配置)
|
||||
- 多 key 之间的负载均衡策略
|
||||
- 对账调度器( reconcile 由 v1 提供)
|
||||
@@ -271,7 +547,9 @@ tests/integration/
|
||||
5. Stage 3 失败时,access_status 正确降级(broken/degraded)
|
||||
6. `strict` 模式下,任一 item 失败整批停止并报告
|
||||
7. `partial` 模式下,成功的 item 不因失败 item 而中断
|
||||
8. 全流程不修改宿主源码,不写宿主数据库
|
||||
8. 结果页可查看每个 run / item 的状态、advisory、retry 轨迹和最终 access status
|
||||
9. 控制面重启后,历史 run 结果仍可查看
|
||||
10. 全流程不修改宿主源码,不写宿主数据库
|
||||
|
||||
## 13. 开放问题(已决策)
|
||||
|
||||
|
||||
@@ -1,269 +1,590 @@
|
||||
# TDD 实施计划 v2 — Batch Auto-Import
|
||||
|
||||
日期:2026-05-21
|
||||
技术架构:`docs/2026-05-22-BATCH_AUTO_IMPORT_V2_ARCHITECTURE.md`
|
||||
|
||||
## 目标
|
||||
|
||||
让管理员只提供 `(base_url, api_key)`,系统即可自动完成:
|
||||
|
||||
1. 上游模型发现
|
||||
2. 模型名归一化与纠错
|
||||
3. 兼容能力画像生成
|
||||
4. 宿主资源创建与 channel 演化
|
||||
5. 异步确认账号与宿主稳定状态
|
||||
6. 最终 `/v1/chat/completions` 闭环验证
|
||||
7. 运行态状态持久化与结果恢复
|
||||
8. 结果查看 API 与页面
|
||||
|
||||
本计划与 [2026-05-21-BATCH_AUTO_IMPORT_SPEC.md](./2026-05-21-BATCH_AUTO_IMPORT_SPEC.md) 保持一致,重点把“多次真实验收中踩出的经验”落实成可测试的实现顺序。
|
||||
|
||||
## 依赖顺序
|
||||
|
||||
必须按以下顺序实现,前一个未完成前不开始后一个:
|
||||
|
||||
```
|
||||
probe/models → probe/classifier
|
||||
↓ ↓
|
||||
└──────→ batch/service ←── host/channel_patch
|
||||
↓
|
||||
cmd/cli/batch_import
|
||||
↓
|
||||
tests/integration/batch_import
|
||||
```text
|
||||
probe/models + probe/aliases
|
||||
↓
|
||||
probe/capability + probe/completion
|
||||
↓
|
||||
batch/provider_id + batch/capability_profile
|
||||
↓
|
||||
host/channel_patch + batch/run_state
|
||||
↓
|
||||
batch/service
|
||||
↓
|
||||
batch/confirmation
|
||||
↓
|
||||
app/http_batch_import + app/http_batch_runs
|
||||
↓
|
||||
cmd/cli/batch_import
|
||||
↓
|
||||
tests/integration/batch_import
|
||||
```
|
||||
|
||||
## Stage 1: probe 模块(上游探测)
|
||||
关键原则:
|
||||
|
||||
- 先把“上游真实返回什么”查清楚,再决定写入宿主什么
|
||||
- 先把“兼容能力”显式建模,再决定 `/responses`、`/chat/completions`、Anthropic 兼容入口如何分流
|
||||
- 先把“异步确认窗口”建模,再讨论最终 `active/degraded/broken`
|
||||
- 先把“状态如何持久化和投影”建模,再做结果页;页面只读运行态状态库,不直接拼接宿主实时返回
|
||||
|
||||
## Stage 1: probe 模块(上游发现)
|
||||
|
||||
### 1.1 `internal/probe/models.go`
|
||||
|
||||
**职责**:调用 `GET {base_url}/v1/models`,解析 OpenAI 格式响应。
|
||||
**职责**:调用 `GET {base_url}/v1/models`,解析 OpenAI-compatible 响应,返回原始模型 ID 列表。
|
||||
|
||||
```go
|
||||
// ProviderModels returns the list of model IDs from a provider's /v1/models endpoint.
|
||||
func ProviderModels(ctx context.Context, baseURL, apiKey string) ([]string, error)
|
||||
type ModelsResult struct {
|
||||
RawModels []string
|
||||
HTTPStatus int
|
||||
LatencyMs int64
|
||||
Error string
|
||||
}
|
||||
|
||||
// Classifier errors into:
|
||||
// - ErrAuthFailed : 401/403
|
||||
// - ErrRateLimited : 429
|
||||
// - ErrUpstreamUnreachable : 502/503/timeout/connection
|
||||
// - ErrUnexpected : 其他 HTTP 错误
|
||||
func ProviderModels(ctx context.Context, baseURL, apiKey string) (*ModelsResult, error)
|
||||
```
|
||||
|
||||
**单测**:
|
||||
错误分类:
|
||||
|
||||
- `ErrAuthFailed`:401/403
|
||||
- `ErrRateLimited`:429
|
||||
- `ErrUpstreamUnreachable`:502/503/timeout/connection
|
||||
- `ErrUnexpected`:其他 HTTP / decode 错误
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestProviderModels_OpenAIFormat_ReturnsModelList(t *testing.T)
|
||||
func TestProviderModels_FilterOutNonChatModels(t *testing.T)
|
||||
func TestProviderModels_EmptyData_ReturnsEmptySlice(t *testing.T)
|
||||
func TestProviderModels_AuthFailed_ReturnsErrAuthFailed(t *testing.T)
|
||||
func TestProviderModels_Timeout_ReturnsErrUpstreamUnreachable(t *testing.T)
|
||||
func TestProviderModels_RecordsLatency(t *testing.T)
|
||||
```
|
||||
|
||||
### 1.2 `internal/probe/classifier.go`
|
||||
### 1.2 `internal/probe/aliases.go`
|
||||
|
||||
**职责**:对 HTTP 响应/错误进行分类,返回结构化 ProbeResult。
|
||||
**职责**:归一化模型名,消除大小写、供应商前缀、常见格式差异,并生成推荐模型。
|
||||
|
||||
```go
|
||||
type ProbeResult struct {
|
||||
URL string
|
||||
HTTPStatus int
|
||||
Models []string
|
||||
Classification string // "auth_failed" | "rate_limited" | "unreachable" | "ok"
|
||||
LatencyMs int64
|
||||
Error string
|
||||
type AliasResult struct {
|
||||
Raw string
|
||||
Normalized string
|
||||
Canonical string
|
||||
}
|
||||
|
||||
func NormalizeModelID(raw string) string
|
||||
func CanonicalModelID(raw string) string
|
||||
func BuildAliasTable(rawModels []string) map[string]AliasResult
|
||||
func ResolveRequestedModel(requested string, rawModels []string) (resolved string, ok bool)
|
||||
```
|
||||
|
||||
**单测**:
|
||||
归一化最少覆盖:
|
||||
|
||||
- 大小写归一
|
||||
- `vendor/model` 前缀剥离
|
||||
- 点号/连字符差异
|
||||
- 典型人工误写场景,例如 `m27` vs `m2.7`
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestClassify_401_ReturnsAuthFailed(t *testing.T)
|
||||
func TestClassify_429_ReturnsRateLimited(t *testing.T)
|
||||
func TestClassify_502_ReturnsUpstreamUnreachable(t *testing.T)
|
||||
func TestClassify_200_ReturnsOk(t *testing.T)
|
||||
func TestNormalizeModelID_MinimaxCanonical(t *testing.T)
|
||||
func TestNormalizeModelID_DeepSeekVendorPrefix(t *testing.T)
|
||||
func TestCanonicalModelID_KimiCaseInsensitive(t *testing.T)
|
||||
func TestResolveRequestedModel_ExactHit(t *testing.T)
|
||||
func TestResolveRequestedModel_UsesNormalizedAlias(t *testing.T)
|
||||
func TestResolveRequestedModel_MissReturnsFalse(t *testing.T)
|
||||
```
|
||||
|
||||
### 1.3 `internal/probe/completion.go`
|
||||
### 1.3 `internal/probe/capability.go`
|
||||
|
||||
**职责**:遍历 `/v1/models` 返回的 data,找第一个能完成 chat completion 的模型并执行 smoke test。
|
||||
**职责**:对同一把 key 进行最小兼容探测,生成 capability profile。
|
||||
|
||||
探测对象最少包括:
|
||||
|
||||
- `GET /v1/models`
|
||||
- `POST /v1/chat/completions`
|
||||
- `POST /v1/responses`
|
||||
- `POST /v1/messages`(Anthropic compatible)
|
||||
|
||||
```go
|
||||
// FindSmokeModel traverses the model list and returns the first model
|
||||
// that successfully completes a chat completion request.
|
||||
func FindSmokeModel(ctx context.Context, baseURL, apiKey string, models []string) (model string, result *CompletionResult, err error)
|
||||
|
||||
// DefaultModelPricing returns a minimal pricing entry for a model
|
||||
// (used when upstream has no pricing data).
|
||||
type DefaultModelPricing struct {
|
||||
Model string
|
||||
PricePer1M float64 // default: 0 (unset)
|
||||
MaxBatch int // default: 0 (unset)
|
||||
type CapabilityProfile struct {
|
||||
SupportsOpenAIModels bool
|
||||
SupportsOpenAIChatCompletions bool
|
||||
SupportsOpenAIResponses bool
|
||||
SupportsAnthropicMessages bool
|
||||
SupportsStream string
|
||||
SupportsTools string
|
||||
SupportsReasoningFields string
|
||||
AuthStyle string
|
||||
ModelIDStyle string
|
||||
KnownAdvisories []string
|
||||
}
|
||||
|
||||
func ProbeCapabilities(ctx context.Context, baseURL, apiKey string, rawModels []string) (*CapabilityProfile, error)
|
||||
```
|
||||
|
||||
**单测**:
|
||||
约束:
|
||||
|
||||
- 对第三方 OpenAI-compatible 上游,`/responses=403` 不得机械判成 `supported`
|
||||
- 要能记录 `responses_unsupported_but_chat_ok`
|
||||
- 要能记录 `initial_probe_race_expected`
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestFindSmokeModel_FirstModelSucceeds_ReturnsIt(t *testing.T)
|
||||
func TestFindSmokeModel_FirstFailsSecondSucceeds_SkipsFirst(t *testing.T)
|
||||
func TestFindSmokeModel_AllFail_ReturnsErrNoUsableModel(t *testing.T)
|
||||
func TestFindSmokeModel_TimeoutBudget_StopsAfterLimit(t *testing.T)
|
||||
func TestDefaultModelPricing_ReturnsZeroValues(t *testing.T)
|
||||
func TestProbeCapabilities_Responses403Chat200_MarksResponsesUnsupported(t *testing.T)
|
||||
func TestProbeCapabilities_AnthropicMessages200_MarksSupported(t *testing.T)
|
||||
func TestProbeCapabilities_ModelsOnly_MarksPartialProfile(t *testing.T)
|
||||
func TestProbeCapabilities_RecordsKnownAdvisories(t *testing.T)
|
||||
```
|
||||
|
||||
---
|
||||
### 1.4 `internal/probe/completion.go`
|
||||
|
||||
**职责**:从 discovered models 中选择 smoke model,执行最小 completion 测试。
|
||||
|
||||
```go
|
||||
type CompletionResult struct {
|
||||
Model string
|
||||
HTTPStatus int
|
||||
LatencyMs int64
|
||||
Classification string
|
||||
Error string
|
||||
}
|
||||
|
||||
func ResolveSmokeModel(requested []string, rawModels []string, profile *CapabilityProfile) (string, error)
|
||||
func SmokeCompletion(ctx context.Context, baseURL, apiKey, model string, profile *CapabilityProfile) (*CompletionResult, error)
|
||||
```
|
||||
|
||||
规则:
|
||||
|
||||
- 优先使用 `ResolveRequestedModel`
|
||||
- 若人工指定模型无效,则自动退回上游真实可用模型
|
||||
- 若 profile 已知不支持 `/responses`,必须直接走 raw `/chat/completions`
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestResolveSmokeModel_UsesRequestedAliasWhenMatched(t *testing.T)
|
||||
func TestResolveSmokeModel_FallsBackToDiscoveredModel(t *testing.T)
|
||||
func TestSmokeCompletion_ResponsesUnsupported_UsesChatCompletions(t *testing.T)
|
||||
func TestSmokeCompletion_AllCandidatesFail_ReturnsErrNoUsableModel(t *testing.T)
|
||||
```
|
||||
|
||||
## Stage 2: batch 模块(批量导入编排)
|
||||
|
||||
### 2.1 `internal/batch/provider_id.go`
|
||||
|
||||
**决策**:选 B,完整 URL 作为 provider_id 一部分(`{normalized_host}-{url_hash_last8}`)。
|
||||
**职责**:把 URL 规范化成稳定 `provider_id`。
|
||||
|
||||
```go
|
||||
// NormalizeProviderID converts a base URL into a stable provider ID using host + hash.
|
||||
// https://api.deepseek.com/v1 → api-deepseek-<last8-of-url-hash>
|
||||
// Collision-resistant: same full URL always produces the same ID.
|
||||
func NormalizeProviderID(baseURL string) string {
|
||||
u, _ := url.Parse(baseURL)
|
||||
host := strings.ToLower(strings.ReplaceAll(u.Host, ":", "-"))
|
||||
hash := fmt.Sprintf("%x", md5.Sum([]byte(baseURL)))[:8]
|
||||
return host + "-" + hash
|
||||
}
|
||||
func NormalizeProviderID(baseURL string) string
|
||||
```
|
||||
|
||||
**单测**:
|
||||
策略:
|
||||
|
||||
- 取 host 为主体
|
||||
- 用完整 URL hash 防碰撞
|
||||
- 同 host 不同 path 生成不同 ID
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestNormalizeProviderID_Basic(t *testing.T)
|
||||
func TestNormalizeProviderID_WithPath_IncludesPathHash(t *testing.T)
|
||||
func TestNormalizeProviderID_Idempotent(t *testing.T)
|
||||
func TestNormalizeProviderID_DifferentPaths_DifferentIDs(t *testing.T) // v1 vs v2 不同 hash
|
||||
func TestNormalizeProviderID_SanitizesPort(t *testing.T)
|
||||
func TestNormalizeProviderID_DifferentPaths_DifferentIDs(t *testing.T)
|
||||
```
|
||||
|
||||
### 2.2 `internal/batch/channel_evolution.go`
|
||||
### 2.2 `internal/batch/capability_profile.go`
|
||||
|
||||
**职责**:计算 channel 现有 model_mapping 与新探测模型的差异,返回需要 patch 的内容。
|
||||
**职责**:将 Stage 1 的 capability profile 映射为导入策略。
|
||||
|
||||
```go
|
||||
// ModelMappingDelta computes which models need to be added to an existing channel.
|
||||
func ModelMappingDelta(existing []string, discovered []string) (add []string)
|
||||
type ImportRoutingStrategy struct {
|
||||
UseRawChatCompletions bool
|
||||
SkipResponsesChecks bool
|
||||
RetryInitial503 bool
|
||||
TreatProbe403AsAdvisory bool
|
||||
}
|
||||
|
||||
// BuildPatchModelMapping returns the full patched model_mapping for a channel.
|
||||
func BuildPatchModelMapping(existing models map[string]string, add []string) map[string]string
|
||||
func BuildImportRoutingStrategy(profile *probe.CapabilityProfile) ImportRoutingStrategy
|
||||
```
|
||||
|
||||
**单测**:
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestBuildImportRoutingStrategy_ResponsesUnsupported_UsesRawChat(t *testing.T)
|
||||
func TestBuildImportRoutingStrategy_ProbeRaceAdvisory_EnablesProbe403Advisory(t *testing.T)
|
||||
func TestBuildImportRoutingStrategy_WarmupExpected_Enables503Retry(t *testing.T)
|
||||
```
|
||||
|
||||
### 2.3 `internal/batch/channel_evolution.go`
|
||||
|
||||
**职责**:对比 channel 现有模型和新探测模型,计算 patch。
|
||||
|
||||
```go
|
||||
func ModelMappingDelta(existing []string, discovered []string) (add []string)
|
||||
func BuildPatchModelMapping(existing map[string]string, aliases map[string]probe.AliasResult) map[string]string
|
||||
func BuildPatchModelPricing(models []string) map[string]any
|
||||
```
|
||||
|
||||
要求:
|
||||
|
||||
- upstream raw model 和 gateway canonical model 的映射必须同时可追踪
|
||||
- patch 后不得破坏旧模型
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestModelMappingDelta_NoOverlap_AddsAll(t *testing.T)
|
||||
func TestModelMappingDelta_FullOverlap_ReturnsEmpty(t *testing.T)
|
||||
func TestModelMappingDelta_PartialOverlap_AddsMissingOnly(t *testing.T)
|
||||
func TestBuildPatchModelMapping_AddsWithIdentityMapping(t *testing.T)
|
||||
func TestBuildPatchModelMapping_PreservesExistingEntries(t *testing.T)
|
||||
func TestBuildPatchModelMapping_AddsCanonicalAliases(t *testing.T)
|
||||
```
|
||||
|
||||
### 2.3 `internal/batch/service.go`
|
||||
### 2.4 `internal/batch/service.go`
|
||||
|
||||
**职责**:编排 Stage 1 + 2 + 3 管道,调用 probe + provision + access。
|
||||
**职责**:编排 Probe → Provision → Async Confirm → Validate 四阶段。
|
||||
|
||||
```go
|
||||
type BatchImportRequest struct {
|
||||
BaseURL string
|
||||
APIKey string
|
||||
RequestedModels []string
|
||||
AccessMode string
|
||||
}
|
||||
|
||||
type BatchImportService struct {
|
||||
host hostadapter.HostAdapter
|
||||
probe *probe.Client
|
||||
provision *provision.ImportService
|
||||
Host hostadapter.HostAdapter
|
||||
Probe *probe.Client
|
||||
Provision *provision.ImportService
|
||||
Confirm *ConfirmationService
|
||||
}
|
||||
|
||||
func (s *BatchImportService) ImportBatch(ctx context.Context, req BatchImportRequest) (*BatchImportResult, error)
|
||||
```
|
||||
|
||||
**单测**(mock 外部 HTTP):
|
||||
职责细化:
|
||||
|
||||
- 不再信任 `requested_models` 为最终事实
|
||||
- 必须把 `raw_models` / `normalized_models` / `resolved_smoke_model` 写入结果
|
||||
- 首次 account test 的 `403 Forbidden` 允许按 advisory 处理
|
||||
- 首次 gateway completion 的短暂 `503 no available accounts` 允许短时重试
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestBatchImport_AllProbeOk_ProvisionsAndValidates(t *testing.T)
|
||||
func TestBatchImport_ProbeFails_SkipsProvision(t *testing.T)
|
||||
func TestBatchImport_CompletionFail_ReportsBroken(t *testing.T)
|
||||
func TestBatchImport_StrictMode_StopsOnFirstFailure(t *testing.T)
|
||||
func TestBatchImport_RequestedModelMiss_UsesDiscoveredModel(t *testing.T)
|
||||
func TestBatchImport_Probe403Race_DowngradesToWarning(t *testing.T)
|
||||
func TestBatchImport_Initial503Warmup_RetriesBeforeBroken(t *testing.T)
|
||||
func TestBatchImport_PartialMode_ContinuesOnFailure(t *testing.T)
|
||||
func TestBatchImport_Idempotent_SkipsExistingAccount(t *testing.T)
|
||||
func TestBatchImport_PersistsRunAndItemState(t *testing.T)
|
||||
```
|
||||
|
||||
---
|
||||
### 2.5 `internal/batch/run_state.go`
|
||||
|
||||
**职责**:持久化 import run / item 的阶段结果,并生成页面读取所需的统计投影。
|
||||
|
||||
```go
|
||||
type ImportRunState struct {
|
||||
RunID string
|
||||
State string
|
||||
TotalItems int
|
||||
ActiveItems int
|
||||
DegradedItems int
|
||||
BrokenItems int
|
||||
StartedAt time.Time
|
||||
UpdatedAt time.Time
|
||||
FinishedAt *time.Time
|
||||
}
|
||||
|
||||
type ImportRunItemState struct {
|
||||
RunID string
|
||||
ItemID string
|
||||
BaseURL string
|
||||
ProviderID string
|
||||
CurrentStage string
|
||||
StageStatus string
|
||||
RetryCount int
|
||||
AdvisoryMessages []string
|
||||
LastErrorStage string
|
||||
LastError string
|
||||
}
|
||||
|
||||
type RunStateStore interface {
|
||||
CreateRun(ctx context.Context, run ImportRunState) error
|
||||
UpdateRun(ctx context.Context, run ImportRunState) error
|
||||
UpsertItem(ctx context.Context, item ImportRunItemState) error
|
||||
ListRuns(ctx context.Context, limit int) ([]ImportRunState, error)
|
||||
ListRunItems(ctx context.Context, runID string) ([]ImportRunItemState, error)
|
||||
}
|
||||
```
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestRunStateStore_CreateAndUpdateRun(t *testing.T)
|
||||
func TestRunStateStore_UpsertItemPreservesLatestStage(t *testing.T)
|
||||
func TestRunStateStore_ListRunsReturnsSummary(t *testing.T)
|
||||
func TestRunStateStore_ListRunItemsReturnsRetryAndAdvisory(t *testing.T)
|
||||
```
|
||||
|
||||
## Stage 3: host adapter 扩展
|
||||
|
||||
### 3.1 `internal/host/sub2api/channel.go`
|
||||
|
||||
新增:
|
||||
|
||||
```go
|
||||
// PatchChannel extends an existing channel's model_mapping with additional models.
|
||||
func (h *HostAdapter) PatchChannel(ctx context.Context, channelID int64, addModels []string) error
|
||||
func (h *HostAdapter) PatchChannelPricing(ctx context.Context, channelID int64, pricing map[string]any) error
|
||||
```
|
||||
|
||||
**单测**(httptest):
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestPatchChannel_AddsModelMappingEntries(t *testing.T)
|
||||
func TestPatchChannel_ChannelNotFound_ReturnsError(t *testing.T)
|
||||
func TestPatchChannel_PreservesExistingEntries(t *testing.T)
|
||||
func TestPatchChannelPricing_AddsNewModels(t *testing.T)
|
||||
```
|
||||
|
||||
---
|
||||
### 3.2 `internal/host/sub2api/accounts.go`
|
||||
|
||||
## Stage 4: CLI
|
||||
若当前 host adapter 的 account test / models 读取逻辑无法暴露 advisory 信息,需要最小增强:
|
||||
|
||||
### 4.1 `cmd/cli/batch_import.go`
|
||||
```go
|
||||
type AccountProbeSnapshot struct {
|
||||
Models []string
|
||||
ProbeStatus string
|
||||
ProbeAdvisory bool
|
||||
ProbeMessage string
|
||||
}
|
||||
|
||||
func (h *HostAdapter) GetAccountProbeSnapshot(ctx context.Context, accountID int64) (*AccountProbeSnapshot, error)
|
||||
```
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestGetAccountProbeSnapshot_403RaceCapturedAsAdvisory(t *testing.T)
|
||||
func TestGetAccountProbeSnapshot_ReturnsModelsAndMessage(t *testing.T)
|
||||
```
|
||||
|
||||
## Stage 4: async confirm 模块
|
||||
|
||||
### 4.1 `internal/batch/confirmation.go`
|
||||
|
||||
**职责**:把“账号刚建好但宿主异步行为未稳定”的窗口显式建模。
|
||||
|
||||
```go
|
||||
type ConfirmationStatus string
|
||||
|
||||
const (
|
||||
ConfirmationPending ConfirmationStatus = "pending"
|
||||
ConfirmationActive ConfirmationStatus = "confirmed_active"
|
||||
ConfirmationWarning ConfirmationStatus = "confirmed_warning"
|
||||
ConfirmationBroken ConfirmationStatus = "confirmed_broken"
|
||||
)
|
||||
|
||||
type ConfirmationService struct {
|
||||
Host hostadapter.HostAdapter
|
||||
}
|
||||
|
||||
func (s *ConfirmationService) ConfirmAccount(ctx context.Context, req ConfirmationRequest) (*ConfirmationResult, error)
|
||||
```
|
||||
|
||||
行为:
|
||||
|
||||
- 先查 `/accounts/:id/models`
|
||||
- 若 models 已正确,但 `/test` 为首次 `403 Forbidden` 且 profile 指示 third-party responses unsupported,则判为 advisory
|
||||
- 若 completion 首次 `503 no available accounts`,在预算内短暂重试
|
||||
- 最终将结果归入:
|
||||
- `confirmed_active`
|
||||
- `confirmed_warning`
|
||||
- `confirmed_broken`
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestConfirmAccount_ModelsOkProbe403Race_ReturnsWarning(t *testing.T)
|
||||
func TestConfirmAccount_Initial503Then200_ReturnsActive(t *testing.T)
|
||||
func TestConfirmAccount_AllRetriesExhausted_ReturnsBroken(t *testing.T)
|
||||
func TestConfirmAccount_RecordsRetryAttempts(t *testing.T)
|
||||
```
|
||||
|
||||
## Stage 5: 结果 API 与页面
|
||||
|
||||
### 5.1 `internal/app/http_batch_import.go`
|
||||
|
||||
**职责**:暴露运行结果查询 API。
|
||||
|
||||
```go
|
||||
func (a *App) listBatchImportRuns(w http.ResponseWriter, r *http.Request)
|
||||
func (a *App) getBatchImportRun(w http.ResponseWriter, r *http.Request)
|
||||
func (a *App) listBatchImportRunItems(w http.ResponseWriter, r *http.Request)
|
||||
func (a *App) getBatchImportRunItem(w http.ResponseWriter, r *http.Request)
|
||||
```
|
||||
|
||||
返回内容至少包含:
|
||||
|
||||
- run summary
|
||||
- item current stage
|
||||
- normalized/resolved model 信息
|
||||
- confirmation / access 状态
|
||||
- advisory / retry / last_error
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestListBatchImportRuns_ReturnsSummary(t *testing.T)
|
||||
func TestGetBatchImportRun_ReturnsRunDetail(t *testing.T)
|
||||
func TestListBatchImportRunItems_ReturnsProjectedItems(t *testing.T)
|
||||
func TestGetBatchImportRunItem_ReturnsAdvisoryAndRetryTrail(t *testing.T)
|
||||
```
|
||||
|
||||
### 5.2 `internal/app/http_batch_runs.go`
|
||||
|
||||
**职责**:提供最小页面输出,不要求复杂前端,但必须让人直接看懂结果。
|
||||
|
||||
页面:
|
||||
|
||||
- `/batch-import/runs`
|
||||
- `/batch-import/runs/{run_id}`
|
||||
|
||||
**页面要求**
|
||||
|
||||
- 列表页可见 run 总状态
|
||||
- 详情页可见 item 级状态、模型纠错结果、capability profile 摘要、warning / broken 原因
|
||||
|
||||
**单测**
|
||||
|
||||
```go
|
||||
func TestBatchImportRunsPage_RendersRunSummary(t *testing.T)
|
||||
func TestBatchImportRunDetailPage_RendersItemStates(t *testing.T)
|
||||
```
|
||||
|
||||
## Stage 6: CLI
|
||||
|
||||
### 6.1 `cmd/cli/batch_import.go`
|
||||
|
||||
```bash
|
||||
go run ./cmd/cli batch-import \
|
||||
--host-base-url string (required)
|
||||
--host-api-key string (required)
|
||||
--entry "url,key" (单条,与 --batch-file 互斥)
|
||||
--batch-file string (批量文件路径)
|
||||
--mode "strict" | "partial" (default: partial)
|
||||
--access-mode "subscription" | "self_service" (default: subscription)
|
||||
--host-base-url string \
|
||||
--host-api-key string \
|
||||
--entry "url,key" \
|
||||
--batch-file string \
|
||||
--mode "strict|partial" \
|
||||
--access-mode "subscription|self_service" \
|
||||
--requested-model string \
|
||||
--confirm-timeout duration
|
||||
```
|
||||
|
||||
**文件格式**:
|
||||
- `--batch-file`:CSV,每行 `base_url,api_key`(逗号分隔,空行忽略,`#` 开头为注释)
|
||||
补充要求:
|
||||
|
||||
**输出格式**:
|
||||
```json
|
||||
{
|
||||
"batch_id": "batch-20260521-001",
|
||||
"total": 3,
|
||||
"active": 2,
|
||||
"broken": 1,
|
||||
"degraded": 0,
|
||||
"results": [
|
||||
{"url": "https://api.deepseek.com", "provider_id": "api-deepseek",
|
||||
"upstream_models": ["deepseek-chat", "deepseek-reasoner"],
|
||||
"channel_id": 10, "account_id": 20,
|
||||
"probe_ok": true, "access_status": "active", "error": null},
|
||||
{"url": "https://api.fail.com", "provider_id": "api-fail",
|
||||
"upstream_models": [], "probe_ok": false,
|
||||
"access_status": "broken", "error": "upstream_unreachable"}
|
||||
]
|
||||
}
|
||||
- 输出必须包含:
|
||||
- `run_id`
|
||||
- `result_page`
|
||||
- `raw_models`
|
||||
- `normalized_models`
|
||||
- `resolved_smoke_model`
|
||||
- `capability_profile`
|
||||
- `confirmation_status`
|
||||
- 如果人工输入模型名不匹配,CLI 要明确给出“推荐模型名”
|
||||
|
||||
**CLI 集成测试**
|
||||
|
||||
```go
|
||||
func TestBatchImportCLI_ReportsResolvedModel(t *testing.T)
|
||||
func TestBatchImportCLI_ReportsCapabilityProfile(t *testing.T)
|
||||
func TestBatchImportCLI_ReportsConfirmationStatus(t *testing.T)
|
||||
func TestBatchImportCLI_ReportsRunResultPage(t *testing.T)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 5: 集成测试
|
||||
## Stage 7: 集成测试
|
||||
|
||||
### `tests/integration/batch_import_test.go`
|
||||
|
||||
使用真实 httptest server 模拟上游 provider:
|
||||
使用真实 SQLite + fake/httptest upstream,覆盖:
|
||||
|
||||
1. 标准 OpenAI-compatible 上游成功导入
|
||||
2. 人工输入模型名错误,但通过 alias 解析成功
|
||||
3. `/responses=403`,`/chat/completions=200` 的第三方兼容场景
|
||||
4. 首次 `/accounts/:id/test=403`,稍后 advisory 翻正
|
||||
5. 首次 `/v1/chat/completions=503 no available accounts`,重试后 `200`
|
||||
6. capability profile 驱动路由分流
|
||||
7. 导入进行中即可查询 run / item 状态
|
||||
8. 控制面重启后,历史 run 结果仍可查看
|
||||
|
||||
```go
|
||||
func TestBatchImport_FullPipeline(t *testing.T)
|
||||
func TestBatchImport_StrictStopsOnFailure(t *testing.T)
|
||||
func TestBatchImport_PartialContinuesOnFailure(t *testing.T)
|
||||
func TestBatchImport_IdempotentOnDuplicateURLKey(t *testing.T)
|
||||
func TestBatchImport_RequestedModelTypo_IsAutoCorrected(t *testing.T)
|
||||
func TestBatchImport_ThirdPartyResponsesUnsupported_StillSucceeds(t *testing.T)
|
||||
func TestBatchImport_ProbeRace_BecomesWarningNotBroken(t *testing.T)
|
||||
func TestBatchImport_Initial503Warmup_RetrySucceeds(t *testing.T)
|
||||
func TestBatchImport_RunStatusIsQueryableDuringExecution(t *testing.T)
|
||||
func TestBatchImport_RunResultSurvivesRestart(t *testing.T)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 验收命令
|
||||
|
||||
```bash
|
||||
go test ./internal/probe/... -v -count=1
|
||||
go test ./internal/batch/... -v -count=1
|
||||
go test ./internal/host/sub2api/... -v -count=1 -run TestPatchChannel
|
||||
go test ./tests/integration/batch_import_test.go -v -count=1
|
||||
go test ./internal/probe/... -v -count=1
|
||||
go test ./internal/batch/... -v -count=1
|
||||
go test ./internal/app/... -v -count=1
|
||||
go test ./internal/host/sub2api/... -v -count=1
|
||||
go test ./tests/integration/... -count=1
|
||||
go test -cover ./internal/... -count=1
|
||||
go vet ./...
|
||||
gofmt -l .
|
||||
```
|
||||
|
||||
覆盖率目标:
|
||||
|
||||
- `internal/probe`: >= 80%
|
||||
- `internal/batch`: >= 75%
|
||||
|
||||
---
|
||||
- `internal/provision`: >= 75%
|
||||
|
||||
## 任务清单
|
||||
|
||||
- [ ] `internal/probe/models.go` + models_test.go
|
||||
- [ ] `internal/probe/classifier.go` + classifier_test.go
|
||||
- [ ] `internal/probe/completion.go` + completion_test.go
|
||||
- [ ] `internal/batch/provider_id.go` + provider_id_test.go
|
||||
- [ ] `internal/batch/channel_evolution.go` + channel_evolution_test.go
|
||||
- [ ] `internal/host/sub2api/channel.go` PatchChannel + test
|
||||
- [ ] `internal/batch/service.go` + service_test.go
|
||||
- [ ] `internal/probe/models.go`
|
||||
- [ ] `internal/probe/aliases.go`
|
||||
- [ ] `internal/probe/capability.go`
|
||||
- [ ] `internal/probe/completion.go`
|
||||
- [ ] `internal/batch/provider_id.go`
|
||||
- [ ] `internal/batch/capability_profile.go`
|
||||
- [ ] `internal/batch/channel_evolution.go`
|
||||
- [ ] `internal/batch/service.go`
|
||||
- [ ] `internal/batch/confirmation.go`
|
||||
- [ ] `internal/batch/run_state.go`
|
||||
- [ ] `internal/host/sub2api/channel.go`
|
||||
- [ ] `internal/host/sub2api/accounts.go`
|
||||
- [ ] `internal/app/http_batch_import.go`
|
||||
- [ ] `internal/app/http_batch_runs.go`
|
||||
- [ ] `cmd/cli/batch_import.go`
|
||||
- [ ] `tests/integration/batch_import_test.go`
|
||||
- [ ] 全量门禁(gofmt / vet / test / race / cover)
|
||||
- [ ] 更新 `EXECUTION_BOARD.md` 跟踪 V2 实施状态
|
||||
|
||||
696
docs/2026-05-22-BATCH_AUTO_IMPORT_V2_ARCHITECTURE.md
Normal file
696
docs/2026-05-22-BATCH_AUTO_IMPORT_V2_ARCHITECTURE.md
Normal file
@@ -0,0 +1,696 @@
|
||||
# V2 技术架构 — Batch Auto-Import
|
||||
|
||||
日期:2026-05-22
|
||||
状态:设计中
|
||||
关联文档:
|
||||
- `docs/2026-05-21-BATCH_AUTO_IMPORT_SPEC.md`
|
||||
- `docs/2026-05-21-BATCH_AUTO_IMPORT_TDD_PLAN.md`
|
||||
- `docs/EXECUTION_BOARD.md`
|
||||
|
||||
## 1. 文档目标
|
||||
|
||||
这份文档回答 4 个问题:
|
||||
|
||||
1. V2 在系统里由哪些组件组成
|
||||
2. 运行状态如何持久化,如何保证导入任务更稳定
|
||||
3. 结果页到底展示什么,字段如何组织
|
||||
4. 后续实现时,后端、存储、页面应该如何分工
|
||||
|
||||
这份文档不重复 `SPEC` 的产品动机,也不替代 `TDD plan` 的编码顺序。它只负责把 V2 的技术结构和结果页设计讲清楚。
|
||||
|
||||
## 2. 设计目标
|
||||
|
||||
V2 必须同时满足这 5 个目标:
|
||||
|
||||
1. **稳定导入**
|
||||
- 单条 URL + key 的 probe / provision / confirm / validate 全过程可控
|
||||
- 批量导入时,单条失败不应让整批状态丢失
|
||||
|
||||
2. **状态可恢复**
|
||||
- 控制面重启后,历史 run 和 item 结果仍可查看
|
||||
- 对 transient 失败要有明确 retry 轨迹
|
||||
|
||||
3. **模型纠错**
|
||||
- 人工填写的模型名不是最终事实
|
||||
- 必须基于 key 实探结果自动归一化、匹配、推荐
|
||||
|
||||
4. **兼容分流**
|
||||
- 对国产模型 / 第三方 OpenAI-compatible 上游,必须记录 capability profile
|
||||
- 后续 routing、确认逻辑、warning 解释都依赖 profile
|
||||
|
||||
5. **结果可视**
|
||||
- 不再只靠 CLI 和日志
|
||||
- 控制面必须提供 run 列表和 run 详情页
|
||||
|
||||
## 3. 非目标
|
||||
|
||||
这版架构明确不做:
|
||||
|
||||
- 多 key 自动负载均衡
|
||||
- 高级调度系统
|
||||
- 实时 WebSocket 推送
|
||||
- 复杂前端工作台
|
||||
- 自动价格发现与自动调价
|
||||
- 宿主数据库直连或宿主 DB 读写
|
||||
|
||||
## 4. 逻辑组件
|
||||
|
||||
V2 在控制面内部拆成 6 个逻辑层:
|
||||
|
||||
```text
|
||||
operator input
|
||||
↓
|
||||
batch import orchestrator
|
||||
├── probe layer
|
||||
├── capability profiler
|
||||
├── provision adapter
|
||||
├── confirmation engine
|
||||
├── validation engine
|
||||
└── run state store
|
||||
↓
|
||||
result projection
|
||||
↓
|
||||
HTTP API / result pages / CLI output
|
||||
```
|
||||
|
||||
### 4.1 Probe Layer
|
||||
|
||||
职责:
|
||||
|
||||
- 调用 upstream `/v1/models`
|
||||
- 探测 `/v1/chat/completions`
|
||||
- 探测 `/v1/responses`
|
||||
- 探测 `/v1/messages`
|
||||
- 抽取原始模型列表
|
||||
- 计算模型归一化结果
|
||||
|
||||
输入:
|
||||
|
||||
- `base_url`
|
||||
- `api_key`
|
||||
- `requested_models`(可选)
|
||||
|
||||
输出:
|
||||
|
||||
- `raw_models`
|
||||
- `normalized_models`
|
||||
- `resolved_smoke_model`
|
||||
- `capability_profile`
|
||||
- `probe_summary`
|
||||
|
||||
### 4.2 Provision Adapter
|
||||
|
||||
职责:
|
||||
|
||||
- 计算 `provider_id`
|
||||
- 查找或创建目标 channel
|
||||
- patch `model_mapping`
|
||||
- patch `model_pricing`
|
||||
- 创建或更新 account
|
||||
- 绑定 provider 资源关系
|
||||
|
||||
输入:
|
||||
|
||||
- `normalized_models`
|
||||
- `capability_profile`
|
||||
- `resolved_smoke_model`
|
||||
|
||||
输出:
|
||||
|
||||
- `channel_id`
|
||||
- `account_id`
|
||||
- `provider_id`
|
||||
- `managed_resources`
|
||||
|
||||
### 4.3 Confirmation Engine
|
||||
|
||||
职责:
|
||||
|
||||
- 处理宿主异步探测窗口
|
||||
- 处理首次 `403 probe race`
|
||||
- 处理首次 `503 no available accounts`
|
||||
- 决定 `confirmation_status`
|
||||
|
||||
输出状态:
|
||||
|
||||
- `confirmed_active`
|
||||
- `confirmed_warning`
|
||||
- `confirmed_broken`
|
||||
|
||||
### 4.4 Validation Engine
|
||||
|
||||
职责:
|
||||
|
||||
- 使用托管 key 对宿主 gateway 真实发起 `/v1/chat/completions`
|
||||
- 决定最终 `access_status`
|
||||
- 将 validation 结果写入结果页投影
|
||||
|
||||
### 4.5 Run State Store
|
||||
|
||||
职责:
|
||||
|
||||
- 持久化 run 和 item 的阶段状态
|
||||
- 持久化 retry、warning、错误阶段
|
||||
- 为结果页提供可直接读取的数据源
|
||||
|
||||
### 4.6 Result Projection
|
||||
|
||||
职责:
|
||||
|
||||
- 将低层运行状态整理成运营和开发都能直接看的摘要
|
||||
- 生成 run 列表统计
|
||||
- 生成 item 详情视图
|
||||
- 不暴露内部实现噪音
|
||||
|
||||
## 5. 运行时状态模型
|
||||
|
||||
### 5.1 Run 级状态
|
||||
|
||||
`ImportRun.state`
|
||||
|
||||
- `running`
|
||||
- `completed`
|
||||
- `completed_with_warnings`
|
||||
- `failed`
|
||||
- `cancelled`
|
||||
|
||||
`ImportRun` 核心字段:
|
||||
|
||||
| 字段 | 含义 |
|
||||
|---|---|
|
||||
| `run_id` | 一次批量导入任务的主键 |
|
||||
| `mode` | `strict` / `partial` |
|
||||
| `access_mode` | `subscription` / `self_service` |
|
||||
| `total_items` | 总条目数 |
|
||||
| `completed_items` | 已完成条目数 |
|
||||
| `active_items` | 最终 active 条目数 |
|
||||
| `degraded_items` | 最终 degraded 条目数 |
|
||||
| `broken_items` | 最终 broken 条目数 |
|
||||
| `state` | run 总状态 |
|
||||
| `started_at` | 开始时间 |
|
||||
| `updated_at` | 最近更新时间 |
|
||||
| `finished_at` | 完成时间 |
|
||||
|
||||
### 5.2 Item 级状态
|
||||
|
||||
`ImportRunItem.current_stage`
|
||||
|
||||
- `probe`
|
||||
- `provision`
|
||||
- `confirm`
|
||||
- `validate`
|
||||
- `done`
|
||||
|
||||
`ImportRunItem.stage_status`
|
||||
|
||||
- `discovered`
|
||||
- `provisioned`
|
||||
- `confirming`
|
||||
- `confirmed_active`
|
||||
- `confirmed_warning`
|
||||
- `confirmed_broken`
|
||||
|
||||
`ImportRunItem` 核心字段:
|
||||
|
||||
| 字段 | 含义 |
|
||||
|---|---|
|
||||
| `item_id` | 单条导入记录 ID |
|
||||
| `base_url` | 当前导入目标 URL |
|
||||
| `provider_id` | 自动生成的 provider 标识 |
|
||||
| `requested_models` | 人工请求模型名 |
|
||||
| `raw_models` | upstream 原始模型列表 |
|
||||
| `normalized_models` | 归一化后模型列表 |
|
||||
| `resolved_smoke_model` | 最终用于 smoke 的模型 |
|
||||
| `capability_profile_json` | 能力画像 |
|
||||
| `channel_id` | 宿主 channel |
|
||||
| `account_id` | 宿主 account |
|
||||
| `confirmation_status` | 确认状态 |
|
||||
| `access_status` | 最终访问状态 |
|
||||
| `retry_count` | 当前总重试次数 |
|
||||
| `advisory_messages` | warning / advisory 列表 |
|
||||
| `last_error_stage` | 最近错误发生阶段 |
|
||||
| `last_error` | 最近错误文本 |
|
||||
| `created_at` | 创建时间 |
|
||||
| `updated_at` | 更新时间 |
|
||||
|
||||
## 6. 状态库设计
|
||||
|
||||
推荐在控制面 SQLite 中新增两张表:
|
||||
|
||||
### 6.1 `import_runs`
|
||||
|
||||
```text
|
||||
id TEXT PRIMARY KEY
|
||||
mode TEXT NOT NULL
|
||||
access_mode TEXT NOT NULL
|
||||
state TEXT NOT NULL
|
||||
total_items INTEGER NOT NULL
|
||||
completed_items INTEGER NOT NULL
|
||||
active_items INTEGER NOT NULL
|
||||
degraded_items INTEGER NOT NULL
|
||||
broken_items INTEGER NOT NULL
|
||||
started_at DATETIME NOT NULL
|
||||
updated_at DATETIME NOT NULL
|
||||
finished_at DATETIME NULL
|
||||
```
|
||||
|
||||
索引:
|
||||
|
||||
- `idx_import_runs_started_at`
|
||||
- `idx_import_runs_state`
|
||||
|
||||
### 6.2 `import_run_items`
|
||||
|
||||
```text
|
||||
id TEXT PRIMARY KEY
|
||||
run_id TEXT NOT NULL
|
||||
base_url TEXT NOT NULL
|
||||
provider_id TEXT NOT NULL
|
||||
current_stage TEXT NOT NULL
|
||||
stage_status TEXT NOT NULL
|
||||
requested_models_json TEXT NOT NULL
|
||||
raw_models_json TEXT NOT NULL
|
||||
normalized_models_json TEXT NOT NULL
|
||||
resolved_smoke_model TEXT NOT NULL
|
||||
capability_profile_json TEXT NOT NULL
|
||||
channel_id INTEGER NULL
|
||||
account_id INTEGER NULL
|
||||
confirmation_status TEXT NOT NULL
|
||||
access_status TEXT NOT NULL
|
||||
retry_count INTEGER NOT NULL
|
||||
advisory_messages_json TEXT NOT NULL
|
||||
last_error_stage TEXT NULL
|
||||
last_error TEXT NULL
|
||||
created_at DATETIME NOT NULL
|
||||
updated_at DATETIME NOT NULL
|
||||
```
|
||||
|
||||
索引:
|
||||
|
||||
- `idx_import_run_items_run_id`
|
||||
- `idx_import_run_items_provider_id`
|
||||
- `idx_import_run_items_stage_status`
|
||||
- `idx_import_run_items_access_status`
|
||||
|
||||
约束:
|
||||
|
||||
- 所有页面和 API 只读这两张控制面表
|
||||
- 不直接读宿主数据库
|
||||
- 宿主状态变化要通过控制面自己的确认与验证结果回写
|
||||
|
||||
## 7. 稳定性机制
|
||||
|
||||
### 7.1 阶段落盘
|
||||
|
||||
每个 item 每完成一个阶段都立刻落盘:
|
||||
|
||||
- probe 完成后写 `discovered`
|
||||
- provision 完成后写 `provisioned`
|
||||
- confirm 开始写 `confirming`
|
||||
- confirm/validate 结束后写 `confirmed_*`
|
||||
|
||||
这样做的价值:
|
||||
|
||||
- 进程中断后不丢历史轨迹
|
||||
- 页面可以显示“卡在哪一阶段”
|
||||
- 后续如果支持 resume,有阶段边界可用
|
||||
|
||||
### 7.2 Advisory 与 Blocking 分离
|
||||
|
||||
V2 必须显式区分:
|
||||
|
||||
- **blocking error**
|
||||
- 真正阻止继续执行
|
||||
- **advisory**
|
||||
- 不阻止完成,但需要展示原因
|
||||
|
||||
典型 advisory:
|
||||
|
||||
- 第三方 upstream 首次 `403 Forbidden` probe race
|
||||
- 首次 `503 no available accounts` 但重试后恢复
|
||||
- `/responses` 不支持,但 `/chat/completions` 可用
|
||||
|
||||
### 7.3 Retry Policy
|
||||
|
||||
不是所有失败都重试。
|
||||
|
||||
建议策略:
|
||||
|
||||
| 错误类型 | 处理 |
|
||||
|---|---|
|
||||
| `401/403 unauthorized` | 不重试,直接失败 |
|
||||
| 首次 `403 probe race` | advisory,等待异步确认,再测 |
|
||||
| `429 rate_limit` | 记录 warning,可选限次重试 |
|
||||
| 首次 `503 no available accounts` | 短时重试 |
|
||||
| `502/503 upstream unreachable` | 按策略有限重试 |
|
||||
|
||||
每次 retry 都必须写入:
|
||||
|
||||
- `retry_count`
|
||||
- `last_error_stage`
|
||||
- `last_error`
|
||||
- 最近 retry 时间
|
||||
|
||||
### 7.4 Restart Safety
|
||||
|
||||
控制面重启后至少保证:
|
||||
|
||||
- 历史 run 列表还能查看
|
||||
- 历史 item 详情还能查看
|
||||
- 若 run 在中途停止,页面能看到“最后停在什么阶段”
|
||||
|
||||
V2 第一阶段可以不做自动 resume,但必须让“失败现场可见”。
|
||||
|
||||
## 8. 结果页设计
|
||||
|
||||
V2 最少提供两个页面:
|
||||
|
||||
- `/batch-import/runs`
|
||||
- `/batch-import/runs/{run_id}`
|
||||
|
||||
### 8.1 批次列表页
|
||||
|
||||
页面目标:
|
||||
|
||||
- 快速看最近有哪些导入批次
|
||||
- 快速判断哪一批成功、哪一批有 warning、哪一批失败
|
||||
|
||||
#### 页面结构
|
||||
|
||||
```text
|
||||
页面标题:Batch Import Runs
|
||||
|
||||
[筛选区]
|
||||
- 状态筛选:all / running / completed / warning / failed
|
||||
- access_mode 筛选:all / subscription / self_service
|
||||
- 时间范围
|
||||
- 关键字搜索:run_id / provider_id / base_url
|
||||
|
||||
[列表表格]
|
||||
- Run ID
|
||||
- Started At
|
||||
- Finished At
|
||||
- Mode
|
||||
- Access Mode
|
||||
- Total
|
||||
- Active
|
||||
- Degraded
|
||||
- Broken
|
||||
- State
|
||||
- Actions
|
||||
```
|
||||
|
||||
#### 字段布局
|
||||
|
||||
| 列 | 说明 |
|
||||
|---|---|
|
||||
| `Run ID` | 可点击进入详情 |
|
||||
| `Started At` | 开始时间 |
|
||||
| `Finished At` | 完成时间,运行中为空 |
|
||||
| `Mode` | `strict` / `partial` |
|
||||
| `Access Mode` | `subscription` / `self_service` |
|
||||
| `Total` | item 总数 |
|
||||
| `Active` | 绿色数值 |
|
||||
| `Degraded` | 黄色数值 |
|
||||
| `Broken` | 红色数值 |
|
||||
| `State` | run 总状态 badge |
|
||||
| `Actions` | 查看详情 |
|
||||
|
||||
#### 状态表现
|
||||
|
||||
- `running`:蓝色 badge
|
||||
- `completed`:绿色 badge
|
||||
- `completed_with_warnings`:黄色 badge
|
||||
- `failed`:红色 badge
|
||||
- `cancelled`:灰色 badge
|
||||
|
||||
### 8.2 批次详情页
|
||||
|
||||
页面目标:
|
||||
|
||||
- 快速回答“这批次具体哪条 URL 出了什么问题”
|
||||
- 能看到模型纠错、compatibility、warning 和 retry 轨迹
|
||||
|
||||
#### 页面结构
|
||||
|
||||
```text
|
||||
页面标题:Batch Import Run Detail
|
||||
|
||||
[头部摘要卡]
|
||||
- Run ID
|
||||
- State
|
||||
- Started At / Finished At
|
||||
- Mode / Access Mode
|
||||
- Total / Active / Degraded / Broken
|
||||
|
||||
[Item 列表表格]
|
||||
- Item ID
|
||||
- Base URL
|
||||
- Provider ID
|
||||
- Requested Models
|
||||
- Resolved Smoke Model
|
||||
- Current Stage
|
||||
- Confirmation Status
|
||||
- Access Status
|
||||
- Retry Count
|
||||
- Last Error Stage
|
||||
- Last Error
|
||||
- Actions
|
||||
|
||||
[点击某个 Item 后展开侧栏/详情区]
|
||||
- URL 与 provider 基础信息
|
||||
- 模型纠错结果
|
||||
- capability profile 摘要
|
||||
- channel/account 资源绑定
|
||||
- advisory messages
|
||||
- retry timeline
|
||||
- 原始错误与最终结论
|
||||
```
|
||||
|
||||
#### Item 表格字段
|
||||
|
||||
| 列 | 说明 |
|
||||
|---|---|
|
||||
| `Item ID` | 单条记录标识 |
|
||||
| `Base URL` | 当前 upstream |
|
||||
| `Provider ID` | 自动生成 provider |
|
||||
| `Requested Models` | 人工输入 |
|
||||
| `Resolved Smoke Model` | 最终 smoke 模型 |
|
||||
| `Current Stage` | `probe/provision/confirm/validate/done` |
|
||||
| `Confirmation Status` | `confirmed_active/warning/broken` |
|
||||
| `Access Status` | `active/degraded/broken` |
|
||||
| `Retry Count` | 已重试次数 |
|
||||
| `Last Error Stage` | 最近错误阶段 |
|
||||
| `Last Error` | 最近错误摘要 |
|
||||
|
||||
#### Item 详情区字段
|
||||
|
||||
1. **基础信息**
|
||||
- `base_url`
|
||||
- `provider_id`
|
||||
- `channel_id`
|
||||
- `account_id`
|
||||
|
||||
2. **模型信息**
|
||||
- `requested_models`
|
||||
- `raw_models`
|
||||
- `normalized_models`
|
||||
- `resolved_smoke_model`
|
||||
- `recommended_models`(如果发生纠错)
|
||||
|
||||
3. **兼容能力摘要**
|
||||
- `supports_openai_models`
|
||||
- `supports_openai_chat_completions`
|
||||
- `supports_openai_responses`
|
||||
- `supports_anthropic_messages`
|
||||
- `model_id_style`
|
||||
- `known_advisories`
|
||||
|
||||
4. **确认与访问结果**
|
||||
- `confirmation_status`
|
||||
- `probe_ok`
|
||||
- `access_status`
|
||||
- `retry_count`
|
||||
- `last_error_stage`
|
||||
- `last_error`
|
||||
|
||||
5. **说明区**
|
||||
- warning 为什么是 warning
|
||||
- broken 为什么是 broken
|
||||
- 这个 item 是否建议人工介入
|
||||
|
||||
### 8.3 页面交互原则
|
||||
|
||||
- 默认先显示摘要,不先展示原始 JSON
|
||||
- 详情区优先显示“结论”和“原因”
|
||||
- 原始 capability profile / 模型列表可以折叠查看
|
||||
- warning 要能解释成一句人能读懂的话
|
||||
|
||||
示例:
|
||||
|
||||
- `responses_unsupported_but_chat_ok`
|
||||
- 页面说明:`该上游不支持 /v1/responses,系统已自动回退到 /v1/chat/completions`
|
||||
|
||||
- `initial_probe_race_expected`
|
||||
- 页面说明:`账号创建后宿主异步探测尚未稳定,首次 /test 结果已按 advisory 处理`
|
||||
|
||||
## 9. API 设计
|
||||
|
||||
### 9.1 列表 API
|
||||
|
||||
`GET /api/batch-import/runs`
|
||||
|
||||
返回结构:
|
||||
|
||||
```json
|
||||
{
|
||||
"runs": [
|
||||
{
|
||||
"run_id": "batch-20260522-001",
|
||||
"state": "completed_with_warnings",
|
||||
"mode": "partial",
|
||||
"access_mode": "subscription",
|
||||
"total_items": 12,
|
||||
"active_items": 9,
|
||||
"degraded_items": 2,
|
||||
"broken_items": 1,
|
||||
"started_at": "2026-05-22T10:00:00+08:00",
|
||||
"finished_at": "2026-05-22T10:02:12+08:00"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 9.2 详情 API
|
||||
|
||||
`GET /api/batch-import/runs/{run_id}`
|
||||
|
||||
返回:
|
||||
|
||||
- run summary
|
||||
- count summary
|
||||
- recent warnings
|
||||
|
||||
### 9.3 Item 列表 API
|
||||
|
||||
`GET /api/batch-import/runs/{run_id}/items`
|
||||
|
||||
支持筛选:
|
||||
|
||||
- `stage_status`
|
||||
- `access_status`
|
||||
- `provider_id`
|
||||
- `has_warning`
|
||||
|
||||
### 9.4 Item 详情 API
|
||||
|
||||
`GET /api/batch-import/runs/{run_id}/items/{item_id}`
|
||||
|
||||
返回:
|
||||
|
||||
- item 全量详情
|
||||
- capability profile
|
||||
- advisory
|
||||
- retry trail
|
||||
|
||||
## 10. 后端模块映射
|
||||
|
||||
建议模块划分:
|
||||
|
||||
### `internal/batch/run_state.go`
|
||||
|
||||
职责:
|
||||
|
||||
- run / item 状态仓储接口
|
||||
- run summary 聚合
|
||||
- 状态投影基础函数
|
||||
|
||||
### `internal/batch/status_projection.go`
|
||||
|
||||
职责:
|
||||
|
||||
- 将底层字段转换成页面友好的摘要
|
||||
- 统一 warning 文案
|
||||
- 统一 badge / state 映射
|
||||
|
||||
### `internal/app/http_batch_import.go`
|
||||
|
||||
职责:
|
||||
|
||||
- run 列表 API
|
||||
- run 详情 API
|
||||
- item 列表 API
|
||||
- item 详情 API
|
||||
|
||||
### `internal/app/http_batch_runs.go`
|
||||
|
||||
职责:
|
||||
|
||||
- 渲染最小结果页
|
||||
- 不关心 probe/provision 细节
|
||||
- 只依赖 projection 层输出
|
||||
|
||||
## 11. 页面草图
|
||||
|
||||
### 11.1 列表页草图
|
||||
|
||||
```text
|
||||
+----------------------------------------------------------------------------------+
|
||||
| Batch Import Runs |
|
||||
+----------------------------------------------------------------------------------+
|
||||
| Status: [all v] Access Mode: [all v] Search: [__________________________] |
|
||||
+----------------------------------------------------------------------------------+
|
||||
| Run ID | State | Total | Active | Degraded | Broken | Started At |
|
||||
| batch-20260522-001 | warning | 12 | 9 | 2 | 1 | 10:00:00 |
|
||||
| batch-20260522-002 | running | 4 | 1 | 0 | 0 | 10:05:13 |
|
||||
+----------------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### 11.2 详情页草图
|
||||
|
||||
```text
|
||||
+----------------------------------------------------------------------------------+
|
||||
| Batch Import Run: batch-20260522-001 |
|
||||
+----------------------------------------------------------------------------------+
|
||||
| State: completed_with_warnings Mode: partial Access: subscription |
|
||||
| Total: 12 Active: 9 Degraded: 2 Broken: 1 |
|
||||
+----------------------------------------------------------------------------------+
|
||||
| Items |
|
||||
+----------------------------------------------------------------------------------+
|
||||
| URL | Provider | Stage | Confirm | Access | Retry | ... |
|
||||
| api.deepseek.com | deepseek | done | warning | active | 1 | ... |
|
||||
| api.53hk.cn/v1 | minimax | done | active | active | 0 | ... |
|
||||
+----------------------------------------------------------------------------------+
|
||||
| Selected Item Detail |
|
||||
+----------------------------------------------------------------------------------+
|
||||
| Requested Models: [minimax-m27-highspeed] |
|
||||
| Resolved Smoke Model: MiniMax-M2.7-highspeed |
|
||||
| Capability: responses=false, chat=true, messages=false |
|
||||
| Advisory: 首次 /test 403 已按异步 probe race 处理 |
|
||||
| Last Error Stage: confirm |
|
||||
| Last Error: API returned 403: Forbidden |
|
||||
+----------------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
## 12. 实施建议
|
||||
|
||||
按最小可落地顺序做:
|
||||
|
||||
1. 先实现 `run_state` 表与 repo
|
||||
2. 再让 batch service 在每阶段写入状态
|
||||
3. 再做 API
|
||||
4. 最后做最小页面
|
||||
|
||||
不要先做页面再补状态库。否则页面会重新依赖日志和运行态对象,后面还得推倒重来。
|
||||
|
||||
## 13. 验收标准
|
||||
|
||||
这份架构落地后,至少应满足:
|
||||
|
||||
1. 导入过程中可以查询到 run 和 item 状态
|
||||
2. 导入完成后可以通过页面复盘每个 item 的结果
|
||||
3. 页面可以看出模型纠错是否发生
|
||||
4. 页面可以看出 capability profile 的关键摘要
|
||||
5. warning 和 broken 有可读解释
|
||||
6. 控制面重启后,历史结果仍可查看
|
||||
7. 页面和 API 只依赖控制面状态库,不依赖宿主数据库
|
||||
@@ -145,8 +145,19 @@
|
||||
|
||||
**当前阶段**:🔨 设计中(待评审与完善)
|
||||
|
||||
**文档**:`docs/2026-05-21-BATCH_AUTO_IMPORT_SPEC.md`(需求规格)
|
||||
**文档**:`docs/2026-05-21-BATCH_AUTO_IMPORT_SPEC.md`(需求规格)
|
||||
**TDD 计划**:`docs/2026-05-21-BATCH_AUTO_IMPORT_TDD_PLAN.md`(实现路径,已确认开放问题)
|
||||
**技术架构**:`docs/2026-05-22-BATCH_AUTO_IMPORT_V2_ARCHITECTURE.md`(运行态状态库、结果页、API、页面字段布局)
|
||||
|
||||
**本轮设计收敛**:
|
||||
- 已把真实验收中的三类高频问题写入 v2 方案:
|
||||
- 添加模型时的模型名归一化与纠错
|
||||
- 第三方国产模型的兼容能力画像(`/responses`、`/chat/completions`、Anthropic compatible、stream/tools)
|
||||
- 添加账号后的异步确认窗口(首次 `403` probe race、首次 `503 no available accounts` warm-up)
|
||||
- 已补充两类产品化能力到 v2:
|
||||
- run / item 状态持久化、retry 轨迹、控制面重启后的历史结果查看
|
||||
- 批次列表页 / 批次详情页,用于查看模型纠错结果、账号状态、provider warning 与最终 access 状态
|
||||
- 当前 v2 的目标已从“同步导入成功”升级为“导入 + 异步确认 + 最终闭环验真”
|
||||
|
||||
**设计待完成**:
|
||||
- [ ] **技术设计**:API 接口(CLI + HTTP)、数据模型、DB schema 变更、错误处理
|
||||
|
||||
Reference in New Issue
Block a user