Downgrade the first third-party account test 403 to an advisory warning when models are already present, and retry transient gateway completion 503 responses during access closure. Add regression coverage for the probe race and completion retry paths, update the execution board, and store the final v0.1.129 Kimi A7M fresh-host acceptance artifact that now reaches succeeded/active/subscription_ready.
11 KiB
sub2api-cn-relay-manager 执行板
日期:2026-05-22
当前 Gate:APPROVED(代码门禁已通过,并且 2026-05-21 已继续收掉 account probe、gateway probe 认证语义和 latest-head self_service fresh-host 复验的剩余问题。最新 MiniMax 53hk fresh-host 验收 artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import/21-summary.json、DeepSeek 2166 subscription fresh-host 验收 artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import/21-summary.json、以及 latest-head self_service 标准 fresh-host 验收 artifacts/real-host-acceptance/20260521_210403/05-import.json / 07-access-status.json 已共同证明:subscription 与 self_service 主链路都能在真实 fresh host 上闭环到 ready,host /v1/models 与 /v1/chat/completions 也都真实返回 HTTP 200。当前仍存在的 reconcile=drifted 只反映共享 fresh-host 环境里的历史残留资源,不阻塞 PRD 首版放行)
目标:实现独立控制面、零侵入宿主、可导入国产模型并具备可运维的导入/回滚/访问闭环。
2026-05-22 当前真相
- 当前主目录
artifacts/real-host-acceptance/已只保留最终证据;历史调试样本已迁到artifacts/real-host-acceptance-archive/ - access ready 语义已经收口为:
/v1/models命中smoke_test_model,且最小POST /v1/chat/completionssmoke 成功;不会再出现 models-only 假 ready subscription主链路已通过 latest fresh-host 复验:- MiniMax 53hk:
artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import/21-summary.json - DeepSeek 2166:
artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import/21-summary.json - Kimi A7M(local host
v0.1.129):artifacts/real-host-acceptance/20260522_122706_local_v0129_kimi_a7m_subscription_freshhost/21-summary.json
- MiniMax 53hk:
self_service主链路已通过 latest-head 标准 fresh-host 复验:artifacts/real-host-acceptance/20260521_210403/05-import.jsonartifacts/real-host-acceptance/20260521_210403/07-access-status.json
- 官方 provider 验证矩阵当前仍保留一条非阻塞事实:
artifacts/real-host-acceptance/20260521_222212_remote43_minimax-m2-7-official_key_import/21-summary.json已证明 official MiniMax 模板链路是通的,但该验证 key 当前命中 upstream429
reconcile=drifted仍可能在 shared fresh-host 上出现,但当前解释是“历史残留资源噪音”,不阻塞 PRD 首版放行- 调通细节与诊断经验已沉淀到:
docs/REAL_HOST_ACCEPTANCE_LEARNINGS.mddocs/REAL_HOST_ARTIFACT_RETENTION.md
本轮已完成
- 宿主身份模型统一
- host 注册时持久化
auth_type/auth_token - import / reconcile / rollback-provider / access 运行时链路切换为
host_id主键 - provider status / resources / access status / import-batches 支持
host_id查询维度
- host 注册时持久化
- managed_resources 宿主维度收口
- 新增迁移
0004_host_identity_and_managed_resources.sql managed_resources唯一键提升为(host_id, resource_type, host_resource_id)- 仓储与服务查询切换为 host-scoped 语义
- 新增迁移
- reconcile run 结果按批次收口
- 新增迁移
0006_reconcile_runs_batch_scope.sql reconcile_runs补充batch_id,batch detail 仅返回本批次 reconcile 记录
- 新增迁移
- capability probe 收敛为无副作用探测
- 不再对真实创建接口发送空
POST
- 不再对真实创建接口发送空
- rollback-provider 风险收敛
- 改为优先按已记录批次资源
RollbackStoredResources()回滚 - 缺少已记录资源时拒绝危险删除
- 改为优先按已记录批次资源
- 文档真相同步
- 新增
docs/2026-05-18-PRODUCTION_REMEDIATION_TASK_BOARD.md - 下调
DEPLOYMENT.md中未实现的/metrics/ 限流 / 监控承诺
- 新增
- current-code remote43 导入链路已补齐 tunnel-aware 验证能力
scripts/import_remote43_provider.sh新增CRM_HOST_BASE,允许把“operator 访问 host 地址”和“CRM 进程访问 host 地址”分离- 历史 live model-mapping 关键证据保留在:
artifacts/real-host-acceptance/20260520_222713_crm18100_live_model_mapping_validation
- current-code remote43 access gate 根因修正已落地
- subscription access 改为宿主侧闭环:CRM 不再依赖外部预先给定的宿主普通用户 key,而是按
subscription_usersselector 在宿主创建/查找托管普通用户、登录创建托管 key、回写 allowed_groups / balance、再执行订阅分配 - account 创建请求现在同步写入
credentials.model_mapping,修正/v1/models读取 account model whitelist 时回退到 GPT 默认集合的问题 - 新增/更新测试覆盖:
internal/access、internal/provision、internal/host/sub2api
- subscription access 改为宿主侧闭环:CRM 不再依赖外部预先给定的宿主普通用户 key,而是按
- current-code access ready 语义已提升到 completion 层
/v1/models不再单独决定subscription_ready/self_service_ready- 只有
/v1/models命中smoke_test_model且/v1/chat/completionssmoke 成功,控制面才会把 access 状态记成 ready - access closure / import runtime artifact / reconcile rerun payload 都会持久化
completion_ok/completion_status/completion_type/completion_preview
- current-code remote43 验收脚本已补 upstream API 证据层
scripts/import_remote43_provider.sh会直探 providerbase_url对应的 upstream/models与/chat/completions- 新增
21-summary.json,用于把 completion 失败自动分流成host_compatibility_gap或upstream_key_quota_issue
- patched CRM external validation 已完成
- patched CRM 实例下,DeepSeek 与 MiniMax 都已验证“completion smoke 通过时能落成 succeeded/active,失败时不会误记成 ready”
20260521_191418_remote43_minimax_key_import与20260521_201509_remote43_deepseek_key_import已同时证明当前subscriptionprovider 链路可真实闭环20260521_210403已证明 latest-headself_service标准 fresh-host 验收也可闭环到self_service_ready / fully_ready
- artifact 保留策略已收口
- 主目录
artifacts/real-host-acceptance/当前只保留最终证据 - 历史失败/半成功/试错样本已迁到
artifacts/real-host-acceptance-archive/ - 分类规则见:
docs/REAL_HOST_ARTIFACT_RETENTION.md
- relay-manager latest-head 已收口 Kimi A7M 两段竞态
- account test 首次
403 Forbidden已降级为 advisory warning;只要/models已命中smoke_test_model,不会再把 batch 误判为 blocking failure - access closure 对导入后瞬时
503 / no available accounts增加短暂 completion retry,避免宿主异步 probe / account warm-up 窗口把真实可用链路误记成broken 20260522_122706_local_v0129_kimi_a7m_subscription_freshhost已证明:在修复后的 relay-manager + patched host 组合下,kimi-a7m / kimi-k2.6可落到batch_status=succeeded、provider_status=active、latest_access_status=subscription_ready
已验证门禁
gofmt -l .✅ 空输出go vet ./...✅go test ./...✅go test -race ./...✅go test -cover ./internal/...✅internal/access:80.5%internal/host/sub2api:78.1%internal/pack:73.9%internal/provision:76.3%internal/store/sqlite:61.4%
go test ./tests/integration/... -count=1✅bash ./scripts/test_real_host_scripts.sh✅
当前保留的最终证据
-
artifacts/real-host-acceptance/20260520_222713_crm18100_live_model_mapping_validation- 证明 account
credentials.model_mapping与 live runtime 对齐
- 证明 account
-
artifacts/real-host-acceptance/20260521_142211_crm18100_deepseek_completion_split- 证明 host completion 失败与 upstream completion 成功可以分离
- 是 completion 分流逻辑的关键根因证据
-
artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import- MiniMax 53hk
subscription最终成功样本 21-summary.json已到batch_status=succeeded、provider_status=active
- MiniMax 53hk
-
artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import- DeepSeek 2166
subscription最终成功样本 21-summary.json已到batch_status=succeeded、provider_status=active
- DeepSeek 2166
-
artifacts/real-host-acceptance/20260521_210403- latest-head
self_service标准 fresh-host 验收最终成功样本 05-import.json=succeeded/self_service_ready/active07-access-status.json=latest_access_status=fully_ready
- latest-head
-
artifacts/real-host-acceptance/20260521_222212_remote43_minimax-m2-7-official_key_import- official MiniMax 模板 live 样本
- 模板链路打通,但当前验证 key 命中 upstream
429
-
artifacts/real-host-acceptance/20260522_122706_local_v0129_kimi_a7m_subscription_freshhost- latest-head relay-manager 对 patched host
v0.1.129的 Kimi A7Msubscription最终成功样本 21-summary.json已到batch_status=succeeded、provider_status=activeaccount_probe_summary明确记录probe_advisory=true、validation_status=warning,证明 403 probe race 已被 relay-manager 正确降级
- latest-head relay-manager 对 patched host
剩余项(P2 / 运营前置,不阻塞按 PRD 首版范围上线)
-
运营前置
- 真实宿主初始化不会自动创建普通用户;上线前必须显式创建普通用户并留存可复用凭据
self_service需要普通用户 key 绑定目标标准 group,且通常还需要可用余额subscription需要 subscription 类型 group + 普通用户订阅分配 + key/group 绑定
-
结构债务
- access / reconcile 仍未完全按 implementation plan 拆到独立子模块
- 当前仍无内置 scheduler/jobs
-
部署与环境限制
- 标准多阶段 Dockerfile 在受限网络环境下仍不稳
- 当前推荐
scripts/build_local_image.sh+Dockerfile.local
-
official provider 验证矩阵
- official MiniMax 当前 live 样本已证明模板链路可用,但验证 key 命中 upstream
429 - Qwen / GLM / Kimi / Step 等官方 provider 是否通过 live 验收,仍取决于后续官方 key 与 quota
- official MiniMax 当前 live 样本已证明模板链路可用,但验证 key 命中 upstream
当前最短后续路径
- 若继续扩大 provider 覆盖面,优先按
docs/PROVIDER_VALIDATION_MATRIX.md补官方 key,再做 official live 验收 - 若继续优化 shared fresh-host 信噪比,对历史残留资源做一次环境清理,降低
reconcile=drifted噪音 - 若继续产品化,推进
v2的 batch auto-import 设计评审,再开始实现
v2 规划:Batch Auto-Import(URL + Key)
当前阶段:🔨 设计中(待评审与完善)
文档:docs/2026-05-21-BATCH_AUTO_IMPORT_SPEC.md(需求规格)
TDD 计划:docs/2026-05-21-BATCH_AUTO_IMPORT_TDD_PLAN.md(实现路径,已确认开放问题)
设计待完成:
- 技术设计:API 接口(CLI + HTTP)、数据模型、DB schema 变更、错误处理
- UI 设计:CLI 输出格式 / HTTP API 文档 / Web 控制台(待确认交付形态)
- 评审:相关专业人员评审设计文档
实现暂停:等设计评审通过后再开始写代码
禁止错误结论
- ❌ 历史失败 artifact ≠ 当前 latest-head 仍失败
- ❌ capability probe 无副作用 ≠ 所有宿主版本都已真实兼容
- ❌ rollback-provider 已改安全路径 ≠ 历史脏资源自动消失
- ❌
HTTP 200≠ 宿主初始化会自动准备普通用户/订阅/余额;这些仍是显式运营前置