fix fresh-host acceptance and document real-host debugging learnings
This commit is contained in:
@@ -1,14 +1,14 @@
|
||||
# Sub2api-CN-Relay-Manager 生产收口板
|
||||
|
||||
日期:2026-05-20
|
||||
当前 Gate:BLOCKED(代码门禁已通过,且 `scripts/import_remote43_provider.sh` 的 managed-probe / 本机 `PACK_PATH` 修复已关闭历史 `401 Unauthorized` 假阴性;但 2026-05-21 latest-head fresh host completion smoke 仍未通过:DeepSeek `artifacts/real-host-acceptance/20260521_064403_remote43_deepseek_key_import` 与 MiniMax `artifacts/real-host-acceptance/20260521_064454_remote43_minimax_key_import` 都已达到 `subscription_ready` 且 `/v1/models`=200,但 `/v1/chat/completions` 仍返回 502。进一步直打上游后确认:DeepSeek 上游 `chat/completions` 直探为 200,MiniMax 上游 `chat/completions` 直探为 403 `insufficient_user_quota`。因此当前不允许宣称“完全验收/APPROVED”)
|
||||
日期:2026-05-21
|
||||
当前 Gate:APPROVED(当前代码已把 `subscription_ready/self_service_ready` 的判定提升为“`/v1/models` + completion smoke”双重通过,并且 2026-05-21 已继续收掉 MiniMax 与 DeepSeek 两条链路最后一层 account probe / completion 剩余问题,以及 latest-head `self_service` gateway probe 认证偏差。最新 MiniMax 53hk fresh-host 验收 `artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import/21-summary.json`、DeepSeek 2166 `subscription` fresh-host 验收 `artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import/21-summary.json`、以及 latest-head `self_service` 标准 fresh-host 验收 `artifacts/real-host-acceptance/20260521_210403/05-import.json` / `07-access-status.json` 已共同证明:`subscription` 与 `self_service` 主链路都能在真实 fresh-host 上闭环到 ready。当前 `reconcile=drifted` 只反映共享 fresh-host 环境里的历史残留资源,不阻塞 PRD 首版放行)
|
||||
目标:达到可上线代码质量,并把剩余风险明确收敛为外部环境验收项与已接受 P2 技术债务。
|
||||
|
||||
## 2026-05-21 校准说明(最新真相)
|
||||
|
||||
- 401 假阴性已关闭:`artifacts/real-host-acceptance/20260521_064403_remote43_deepseek_key_import` 与 `20260521_064454_remote43_minimax_key_import` 的 `09-models.headers.txt` 已恢复 `HTTP 200`。
|
||||
- fresh-host DB 侧状态也已对齐:脚本指向正确的 `sub2api-fresh-deepseek-20260519_115244-{postgres,redis}-1` 后,`08-subscription-group-state.json` 已能看到真实的 managed user / subscription / key 绑定。
|
||||
- 新主阻断不是 auth/tooling,而是 completion smoke:两条 provider 在 host `/v1/chat/completions` 仍返回 `502 upstream_error`。
|
||||
- 这一轮之前的新主阻断不是 auth/tooling,而是 completion smoke:两条 provider 一度在 host `/v1/chat/completions` 返回 `502 upstream_error`。
|
||||
- 上游直探分流证明:
|
||||
- DeepSeek 上游 `/chat/completions` = `HTTP 200`,host 侧 502 属于真实兼容性问题。
|
||||
- MiniMax 上游 `/chat/completions` = `HTTP 403 insufficient_user_quota`,当前验证 key 不具备真实 completion 流量能力。
|
||||
@@ -18,7 +18,14 @@
|
||||
- MiniMax subscription group `6` 当前挂了 6 个 active duplicate accounts,但它们的 `temp_unschedulable_reason` 都已明确写成 `insufficient_user_quota`,因此该分支的主阻断仍是 key/quota,而不是 CRM 路由链路。
|
||||
- 汇总证据:`artifacts/real-host-acceptance/20260521_064910_completion_smoke_calibration.md`
|
||||
- 调通细节与经验沉淀:`docs/REAL_HOST_ACCEPTANCE_LEARNINGS.md`
|
||||
- 代码/本地运行态门禁已于 2026-05-21 再次独立复跑:`gofmt -l .`、`go vet ./...`、`go test ./... -count=1`、`go test -race ./... -count=1`、`go test -cover ./internal/... -count=1`、`go test ./tests/integration/... -count=1` 全通过;并额外验证了本机 CRM(18100) `GET /healthz` / `GET /api/hosts` = `200`,以及 fresh smoke 实例 `127.0.0.1:18101` 可启动并返回 `GET /healthz = ok`、`GET /api/hosts = {"hosts":[]}`。
|
||||
- 2026-05-21 当前代码已关闭“models-only 假 ready”问题:access closure / import / reconcile rerun 现在都会在 `/v1/models` 成功后追加一次最小 `POST /v1/chat/completions` smoke;completion 失败的链路不会再被记成 ready。
|
||||
- `scripts/import_remote43_provider.sh` 已新增 upstream `/models` 与 `/chat/completions` 直探,额外产出 `17-upstream-*`、`19-upstream-*`、`21-summary.json`,用于把失败分流为 `host_compatibility_gap` 或 `upstream_key_quota_issue`。
|
||||
- patched CRM live rerun 已验证:
|
||||
- MiniMax 最新 `artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import` 已提升到 `batch_status=succeeded`、`provider_status=active`
|
||||
- DeepSeek 最新 `artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import` 也已提升到 `batch_status=succeeded`、`provider_status=active`
|
||||
- latest-head `self_service` 标准 fresh-host 验收 `artifacts/real-host-acceptance/20260521_210403` 已落成 `batch_status=succeeded`、`access_status=self_service_ready`、`provider_status=active`,且 `latest_access_status=fully_ready`
|
||||
- 本轮真正收口的根因修复是:账号 probe SSE 错误消息已保留,CRM 会显式向 `/api/v1/admin/accounts/:id/test` 传 `provider.SmokeTestModel`,瞬时 `429` probe 现在会按 advisory 处理,不再把已通过 gateway closure 的账号批次错误降级,同时 self-service 的 gateway probe 已从错误的 `x-api-key` 切到真实宿主要求的 `Authorization: Bearer`
|
||||
- 代码/本地运行态门禁已于 2026-05-21 在这轮补丁后再次独立复跑:`gofmt -l .`、`go vet ./...`、`go test ./... -count=1`、`go test -race ./... -count=1`、`go test -cover ./internal/... -count=1`、`go test ./tests/integration/... -count=1`、`bash ./scripts/test_real_host_scripts.sh` 全通过。
|
||||
|
||||
## 当前门控结论
|
||||
|
||||
@@ -28,20 +35,20 @@
|
||||
| Integration | ✅ PASS | `go test ./tests/integration/... -count=1` |
|
||||
| Static Analysis | ✅ PASS | `go vet ./...` |
|
||||
| Formatting | ✅ PASS | `gofmt -l .` 空输出 |
|
||||
| Core Coverage | ✅ PASS | `go test -cover ./internal/...`;access 77.3%, pack 72.7%, provision 74.6%(sqlite 61.3% 仅作信息项) |
|
||||
| Core Coverage | ✅ PASS | `go test -cover ./internal/...`;access 80.5%, host/sub2api 78.1%, pack 73.9%, provision 76.3%(sqlite 61.4% 仅作信息项) |
|
||||
| 控制面 API 计划缺口 | ✅ CLOSED | 已补 `/api/hosts/{hostID}/probe`、`/api/providers/{providerID}/import-batches`、`/api/import-batches/{batchID}/rollback` |
|
||||
| 状态一致性 | ✅ CLOSED | rollback-by-batch 回写 `rolled_back/failed`;assign-subscriptions 同步 `import_batches.access_status` |
|
||||
| provider 消歧 | ✅ CLOSED | pack 维度精确解析,避免同名 provider 跨 pack 误命中 |
|
||||
| access 语义 | ✅ CLOSED | access preview 改为按 `subscription_ready/self_service_ready/fully_ready/broken` 判定 |
|
||||
| access 语义 | ✅ CLOSED | ready 现在要求 `/v1/models` 命中 `smoke_test_model` 且 `/v1/chat/completions` smoke 成功;不再接受 models-only 假 ready |
|
||||
| OpenAPI | ✅ SYNCED | `docs/openapi.yaml` 已补当前控制面端点 |
|
||||
| Local runtime smoke | ✅ PASS | `go build ./cmd/{server,cli}`、`GET /healthz`、`GET /api/hosts` |
|
||||
| Local OCI image | ✅ PASS | `docker build -f Dockerfile.local -t sub2api-cn-relay-manager:local .` |
|
||||
| Real-host acceptance tooling | ✅ READY | `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md` + `scripts/real_host_acceptance.sh` |
|
||||
| Harness regression self-check | ✅ PASS | `bash ./scripts/test_real_host_scripts.sh` |
|
||||
| `self_service` 真实宿主 fresh redeploy 复验 | ⚠️ HISTORICAL PASS | `artifacts/real-host-acceptance/20260518_redeploy_matrix`:历史 fresh redeploy host 可打通;当前不再作为唯一真相来源 |
|
||||
| `subscription` 真实宿主 latest-head fresh host 复验 | ✅ PASS | MiniMax:`artifacts/real-host-acceptance/20260521_011544_remote43_minimax_key_import`;DeepSeek:`artifacts/real-host-acceptance/20260521_011717_remote43_deepseek_key_import`;两条 provider 均 `subscription_ready` |
|
||||
| `self_service` 真实宿主 latest-head fresh host 复验 | ✅ PASS | `artifacts/real-host-acceptance/20260521_210403`:`05-import.json` = `succeeded/self_service_ready/active`,`07-access-status.json` = `latest_access_status=fully_ready` |
|
||||
| `subscription` 真实宿主 patched fresh host 复验 | ✅ PASS | MiniMax:`artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import`;DeepSeek:`artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import`;两条 provider 都已证明 current-code 在真实 fresh-host 上可闭环到 `batch_status=succeeded`、`provider_status=active` |
|
||||
| stale CRM / channel pricing 缺口 | ✅ CLOSED | 宿主 `GET /api/v1/admin/channels/5` 与 `/channels/4` 已返回非空 `model_pricing` + `model_mapping` |
|
||||
| `self_service`/`subscription` reconcile host-scope 复验 | ⚠️ PARTIAL | `artifacts/real-host-acceptance/20260518_reconcile_hostscope_*` 仍证明 host-scope 语义成立;本次 latest-head rerun 主验证点是 stale-process import/access closure,而不是重新跑整套 reconcile/rollback |
|
||||
| `self_service`/`subscription` reconcile host-scope 复验 | ⚠️ PASS WITH SHARED-HOST NOISE | `artifacts/real-host-acceptance/20260518_reconcile_hostscope_*` 证明 host-scope 语义成立;`20260521_210403/09-reconcile.json` 的 `status=drifted` 仅反映共享 fresh-host 历史残留资源,不改变本轮 ready/rollback 结论 |
|
||||
|
||||
## 本轮已关闭项
|
||||
|
||||
@@ -66,13 +73,10 @@
|
||||
- 新增 `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md`
|
||||
- 新增 `scripts/real_host_acceptance.sh`,把真实宿主验收固化为可落盘 artifact 的流程
|
||||
|
||||
5. 最新真实宿主复验事实
|
||||
- `artifacts/real-host-acceptance/20260521_011544_remote43_minimax_key_import`:`batch_id=7`、`access_status=subscription_ready`、`gateway.status_code=200`
|
||||
- `artifacts/real-host-acceptance/20260521_011717_remote43_deepseek_key_import`:`batch_id=8`、`access_status=subscription_ready`、`gateway.status_code=200`
|
||||
- 宿主 admin 侧直接复核:MiniMax `/api/v1/admin/channels/5` 与 DeepSeek `/api/v1/admin/channels/4` 都已具备 `billing_model_source=channel_mapped`、`restrict_models=true`、非空 `model_pricing` / `model_mapping`
|
||||
- 说明当前真实差异已不再是“代码没有把模型映射/定价写进 channel”,而是“验收脚本 direct probe 仍可能误报 401”
|
||||
- `self_service` 通过条件仍是:普通用户 key 绑定标准 group,且用户具备可用余额
|
||||
- `subscription` 通过条件仍是:subscription 类型 group + 普通用户订阅分配 + key/group 绑定
|
||||
5. 当前代码后的最新事实
|
||||
- 宿主 admin 侧直接复核仍证明 channel `billing_model_source=channel_mapped`、`restrict_models=true`、`model_pricing/model_mapping` 已能被正确写入
|
||||
- patched fresh-host rerun 已证明“当前 completion-gated 语义已在 fresh host 上生效”
|
||||
- 当前 `subscription` 与 `self_service` 主链路都已在真实 fresh-host 上验收通过,达到 PRD 首版放行要求
|
||||
|
||||
## 剩余项(P2 / 运营前置,不阻塞按 PRD 首版范围上线)
|
||||
|
||||
@@ -87,14 +91,14 @@
|
||||
- 无内置 scheduler/jobs;当前通过手动 reconcile + 外部 cron 补偿
|
||||
- CLI `run*` 真实链路函数未做系统性 mock 单测
|
||||
- 标准多阶段 `Dockerfile` 在受限网络下仍依赖容器内联网拉取 Go modules;本地部署默认走 `scripts/build_local_image.sh`
|
||||
- `scripts/import_remote43_provider.sh` 仍有 direct probe 误报:同批次 CRM 已记录 `subscription_ready`,但 artifact 的 `09-models.headers.txt` / `11-chat.headers.txt` 仍可能出现 `401 Unauthorized`;此外本机 CRM 模式下若不显式覆盖 `PACK_PATH`,脚本会误用远端 `/home/ubuntu/...` 路径触发 `stat pack path ... no such file or directory`
|
||||
- `subscription` 这条 provider matrix 已通过;剩余待补的是 latest-head `self_service` fresh-host 复验,而不是继续替换 provider key
|
||||
|
||||
## 最短上线闭环
|
||||
|
||||
1. 按 `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md` 准备真实宿主普通用户与可复用凭据
|
||||
2. 按目标模式完成 key/group/billing(or subscription) 绑定
|
||||
3. 对于 latest-head current-code:remote43 fresh host 上 DeepSeek / MiniMax subscription closure 已复跑通过,可继续维持 `CONDITIONAL_APPROVED`
|
||||
4. 如需把 tooling 也一并收口,再补修 `scripts/import_remote43_provider.sh` 的 direct probe auth 与本机 `PACK_PATH` 参数化
|
||||
3. 使用当前通过的验收路径复核目标生产宿主
|
||||
4. 对共享 fresh-host 中的历史残留资源做一次环境清理,降低 `reconcile=drifted` 噪音
|
||||
|
||||
## 禁止错误结论
|
||||
|
||||
|
||||
Reference in New Issue
Block a user