diff --git a/docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md b/docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md index fbfe8c15..f08e1cde 100644 --- a/docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md +++ b/docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md @@ -150,6 +150,27 @@ hook 执行时会额外导出: artifacts/real-host-acceptance// ``` +### Artifact 安全模式 + +默认: + +```bash +ARTIFACT_SECURITY_MODE=safe +ARTIFACT_INCLUDE_SECRETS=0 +``` + +含义: +- `safe`:主 artifact 目录只允许落脱敏后的验收证据,可作为仓库内长期保留材料。 +- `debug`:允许额外生成本地敏感调试材料;这类材料不得作为默认主证据提交或长期保留。 +- `ARTIFACT_INCLUDE_SECRETS=1` 只允许用于本地短时调试;一旦开启,产物不再默认视为可入库证据。 + +`safe` 模式下的硬规则: +- 不落完整 upstream / managed / user API key +- 不落完整 bearer token +- 不落可直接复用的 SQL 明文 key 语句 +- 不落 Redis cache key 原文 +- header 文件必须去掉 `Authorization` / `Cookie` / `Set-Cookie` / `x-api-key` + 默认文件顺序: - `01-create-host.json` - `02-probe-host.json` @@ -165,6 +186,22 @@ artifacts/real-host-acceptance// - `10-batch-detail.json` - `11-rollback.json`(若未跳过) +remote43 / 本地缩圈脚本若需要额外证据,会在同目录追加: +- `00-local-key-source.json`(只保留 redacted key 指纹/前后缀) +- `01-runtime-context.json`(仅保留 hash 后的 user/admin/managed 身份) +- `05-subscription-access-prep.summary.json`(替代默认明文 SQL) +- `07-redis-targeted-invalidation.json`(只保留失效动作结果,不保留 cache key 原文) +- `08-subscription-group-state.json`(已裁剪 key 明文) +- `21-summary.json` / `99-semantic-summary.json`(推荐长期保留的摘要证据) +- 若你是在清理旧目录,而不是生成新验收产物,优先运行: + ```bash + python3 scripts/migrate_historical_artifacts.py artifacts/real-host-acceptance + ``` + 它会把主目录中的历史敏感材料迁到 `artifacts/real-host-acceptance-sensitive/`,并在原目录生成安全摘要版。 +- 历史目录迁移脚本当前已覆盖两层: + 1. 固定命名标准 artifact(runtime-context / key-source / redis invalidation / group-state / sql summary / headers) + 2. 复杂业务快照与 JSON-in-string 字段(`summary.json`、`99-summary.json`、`99-semantic-summary.json`、`05a-batch-detail-pre-access.json`、`07-access-status.json`、`10-batch-detail.json` 以及其中的 `DetailsJSON/details_json/probe_summary_json`) +- 若迁移后仍看到类似 `00-managed-key-corrected.txt` 的手工 probe 文本,它们属于非标准人工产物,当前仍建议迁到 `artifacts/real-host-acceptance-sensitive/` 或直接删除。 ## 通过标准 至少同时满足: @@ -246,6 +283,9 @@ SKIP_ROLLBACK=1 scripts/real_host_acceptance.sh - 若 `curl -I --max-time 5 $CRM_HOST_BASE/healthz` 完全收不到 header - 就应先判定为 tunnel 失活或远端链路异常 - 这类现象会在 `03-import.body.json` 中表现为 `get host version ... context deadline exceeded` +25. 当前 artifact 安全模式默认是 `safe`: + - 主目录 `artifacts/real-host-acceptance/` 只能保留脱敏后证据 + - 若为了缩圈必须看原始 SQL / headers / token 相关细节,应显式切到 `ARTIFACT_SECURITY_MODE=debug` 并把产物视为本地敏感材料,不进入仓库主证据区 ## 建议固定执行的快速诊断顺序 diff --git a/docs/REAL_HOST_ARTIFACT_RETENTION.md b/docs/REAL_HOST_ARTIFACT_RETENTION.md index 10155f25..b354f1c8 100644 --- a/docs/REAL_HOST_ARTIFACT_RETENTION.md +++ b/docs/REAL_HOST_ARTIFACT_RETENTION.md @@ -18,6 +18,72 @@ - 只把“latest-head / fresh-host / current-code”直接相关的证据放进“最终证据” - 纯预跑、空目录、只到 preflight、已被后续更完整证据覆盖的目录,放入“可清理” +## Security 分层(新增硬规则) + +### A. `safe` artifact(可入库) +满足以下条件时,才允许留在 `artifacts/real-host-acceptance/` 并被文档直接引用: +- 不包含完整 upstream / managed / user API key +- 不包含完整 bearer token +- 不包含 Redis cache key 原文 +- 不包含可直接执行的明文 SQL 凭据操作 +- headers 已去掉 `Authorization` / `Cookie` / `Set-Cookie` / `x-api-key` +- 身份类字段(user_id / admin_user_id / managed_user_email)已改为 hash 或摘要 + +### B. `debug` artifact(本地敏感) +以下材料只能作本地短时调试使用,不得视为主证据入库: +- 明文 SQL(例如 subscription access prep 原始语句) +- 带完整 key/token 的 runtime context +- 未脱敏 headers / raw body +- Redis cache key 原文 + +### C. 解释规则 +- 文档中的“最终证据”默认指 `safe` artifact +- 若某目录只能在 `debug` 模式下复现价值,则应迁出主视图或只保留摘要版(例如 `21-summary.json` / `99-semantic-summary.json`) + +### D. 历史目录迁移脚本 +当历史 `artifacts/real-host-acceptance/` 目录里仍残留旧版敏感材料时,使用: + +```bash +python3 scripts/migrate_historical_artifacts.py artifacts/real-host-acceptance +``` + +默认行为: +- 原地把可安全化文件改写成 `safe` 版本 +- 把明文敏感文件移动到 sibling 目录: + - `artifacts/real-host-acceptance-sensitive/` +- 为旧的 `05-subscription-access-prep.sql` 生成: + - `05-subscription-access-prep.summary.json` +- 为旧的 `07-redis-targeted-invalidation.txt` 生成: + - `07-redis-targeted-invalidation.json` + +当前脚本会处理的典型历史文件: +- `00-local-key-source.json` +- `01-runtime-context.json` +- `00-context.json` +- `05-subscription-access-prep.sql` +- `07-redis-targeted-invalidation.txt` +- `08-subscription-group-state.json` +- `*.headers.txt` +- `00-managed-key.txt` +- `00-raw-user-key.txt` + +脚本不会替你决定“最终证据 / 归档 / 清理”的业务分类;它只负责把旧目录先迁移到可安全审阅的形态。 +- 第二轮迁移已补覆盖复杂 JSON 快照: + - `summary.json` + - `99-summary.json` + - `99-semantic-summary.json` + - `05a-batch-detail-pre-access.json` + - `07-access-status.json` + - `10-batch-detail.json` +- 对这类文件,脚本会递归处理常见敏感字段,并额外解析: + - `DetailsJSON` + - `details_json` + - `probe_summary_json` + 这三类 JSON-in-string 字段,先反序列化再脱敏后写回。 +- 当前仍需人工关注的残留主要是非标准手工文本快照,例如: + - `00-managed-key-corrected.txt` + - 其他不在固定命名集合中的手工 probe 文本 + 这些不影响“标准 artifact 已安全化”的结论,但不能直接把主目录视为 100% 无人工遗留。 ## 1. 可保留为最终证据 这些目录应长期保留,属于当前 Gate=`APPROVED` 的核心证据。 diff --git a/internal/provision/batch_detail_service_test.go b/internal/provision/batch_detail_service_test.go index fe8938b3..0674ecee 100644 --- a/internal/provision/batch_detail_service_test.go +++ b/internal/provision/batch_detail_service_test.go @@ -167,7 +167,10 @@ func TestDeriveProviderStatus(t *testing.T) { want string }{ {name: "recovered success beats stale reconcile", batchStatus: BatchStatusSucceeded, accessStatus: AccessStatusSelfServiceReady, reconcileStatus: "degraded", want: ProviderStatusActive}, + {name: "partial ready with active reconcile becomes active", batchStatus: BatchStatusPartial, accessStatus: AccessStatusSubscriptionReady, reconcileStatus: ProviderStatusActive, want: ProviderStatusActive}, {name: "succeeded batch", batchStatus: BatchStatusSucceeded, reconcileStatus: "not_run", want: ProviderStatusActive}, + {name: "rolled back beats active reconcile", batchStatus: BatchStatusRolledBack, reconcileStatus: ProviderStatusActive, want: ProviderStatusFailed}, + {name: "failed beats active reconcile", batchStatus: BatchStatusFailed, reconcileStatus: ProviderStatusActive, want: ProviderStatusFailed}, {name: "failed batch", batchStatus: BatchStatusFailed, want: ProviderStatusFailed}, {name: "running batch", batchStatus: "running", want: "running"}, {name: "unknown fallback", batchStatus: " pending ", want: "pending"}, diff --git a/internal/provision/provider_status_service.go b/internal/provision/provider_status_service.go index 4772efef..889fc1f9 100644 --- a/internal/provision/provider_status_service.go +++ b/internal/provision/provider_status_service.go @@ -168,21 +168,25 @@ func (s *ProviderStatusService) resolveHostAndBatch(ctx context.Context, provide } func deriveProviderStatus(batchStatus, accessStatus, reconcileStatus string) string { + batchStatus = strings.TrimSpace(batchStatus) reconcileStatus = strings.TrimSpace(reconcileStatus) accessStatus = strings.TrimSpace(accessStatus) - if strings.TrimSpace(batchStatus) == BatchStatusSucceeded && accessStatus != "" && accessStatus != AccessStatusBroken { + switch batchStatus { + case BatchStatusFailed, BatchStatusRolledBack: + return ProviderStatusFailed + } + if (batchStatus == BatchStatusSucceeded || batchStatus == BatchStatusPartial) && accessStatus != "" && accessStatus != AccessStatusBroken && reconcileStatus == ProviderStatusActive { + return ProviderStatusActive + } + if batchStatus == BatchStatusSucceeded && accessStatus != "" && accessStatus != AccessStatusBroken { return ProviderStatusActive } if reconcileStatus != "" && reconcileStatus != "not_run" { return reconcileStatus } - switch strings.TrimSpace(batchStatus) { + switch batchStatus { case BatchStatusSucceeded: return ProviderStatusActive - case BatchStatusFailed: - return ProviderStatusFailed - case BatchStatusRolledBack: - return ProviderStatusFailed case "running": return "running" default: diff --git a/internal/provision/provider_status_service_test.go b/internal/provision/provider_status_service_test.go index ad00bdd6..631a91e8 100644 --- a/internal/provision/provider_status_service_test.go +++ b/internal/provision/provider_status_service_test.go @@ -289,3 +289,87 @@ func TestProviderStatusServiceFailsOnInvalidReconcileSummary(t *testing.T) { t.Fatalf("GetStatus() error = %v, want decode reconcile summary failure", err) } } + +func TestProviderStatusServicePromotesRecoveredPartialBatchAfterActiveReconcile(t *testing.T) { + store := openProvisionTestStore(t) + defer closeProvisionTestStore(t, store) + + ctx := context.Background() + hostID := seedProvisionHost(t, store, "host-1", "https://sub2api.example.com") + packID, err := store.Packs().Create(ctx, sqlite.Pack{PackID: "openai-cn-pack", Version: "1.0.0", Checksum: "checksum-1"}) + if err != nil { + t.Fatalf("Packs().Create() error = %v", err) + } + providerID, err := store.Providers().Create(ctx, sqlite.Provider{ + PackID: packID, ProviderID: "deepseek", DisplayName: "DeepSeek", BaseURL: "https://api.deepseek.com", Platform: "openai", + }) + if err != nil { + t.Fatalf("Providers().Create() error = %v", err) + } + batchID, err := store.ImportBatches().Create(ctx, sqlite.ImportBatch{ + HostID: hostID, PackID: packID, ProviderID: providerID, Mode: ImportModePartial, BatchStatus: BatchStatusPartial, AccessStatus: AccessStatusSubscriptionReady, + }) + if err != nil { + t.Fatalf("ImportBatches().Create() error = %v", err) + } + if _, err := store.AccessClosures().Create(ctx, sqlite.AccessClosureRecord{ + BatchID: batchID, ClosureType: AccessModeSubscription, Status: AccessStatusSubscriptionReady, DetailsJSON: `{"completion_ok":true}`, + }); err != nil { + t.Fatalf("AccessClosures().Create() error = %v", err) + } + if _, err := store.ReconcileRuns().Create(ctx, sqlite.ReconcileRun{ + BatchID: batchID, HostID: hostID, ProviderID: providerID, Status: ProviderStatusActive, SummaryJSON: `{"access_status":"subscription_ready"}`, + }); err != nil { + t.Fatalf("ReconcileRuns().Create() error = %v", err) + } + + snapshot, err := NewProviderStatusService(store).GetStatus(ctx, ProviderQuery{ProviderID: "deepseek", HostID: "host-1"}) + if err != nil { + t.Fatalf("GetStatus() error = %v", err) + } + if snapshot.ProviderStatus != ProviderStatusActive { + t.Fatalf("ProviderStatus = %q, want %q", snapshot.ProviderStatus, ProviderStatusActive) + } +} + +func TestProviderStatusServiceKeepsRolledBackBatchFailedEvenWithActiveReconcile(t *testing.T) { + store := openProvisionTestStore(t) + defer closeProvisionTestStore(t, store) + + ctx := context.Background() + hostID := seedProvisionHost(t, store, "host-1", "https://sub2api.example.com") + packID, err := store.Packs().Create(ctx, sqlite.Pack{PackID: "openai-cn-pack", Version: "1.0.0", Checksum: "checksum-1"}) + if err != nil { + t.Fatalf("Packs().Create() error = %v", err) + } + providerID, err := store.Providers().Create(ctx, sqlite.Provider{ + PackID: packID, ProviderID: "deepseek", DisplayName: "DeepSeek", BaseURL: "https://api.deepseek.com", Platform: "openai", + }) + if err != nil { + t.Fatalf("Providers().Create() error = %v", err) + } + batchID, err := store.ImportBatches().Create(ctx, sqlite.ImportBatch{ + HostID: hostID, PackID: packID, ProviderID: providerID, Mode: ImportModePartial, BatchStatus: BatchStatusRolledBack, AccessStatus: AccessStatusSubscriptionReady, + }) + if err != nil { + t.Fatalf("ImportBatches().Create() error = %v", err) + } + if _, err := store.AccessClosures().Create(ctx, sqlite.AccessClosureRecord{ + BatchID: batchID, ClosureType: AccessModeSubscription, Status: AccessStatusSubscriptionReady, DetailsJSON: `{"completion_ok":true}`, + }); err != nil { + t.Fatalf("AccessClosures().Create() error = %v", err) + } + if _, err := store.ReconcileRuns().Create(ctx, sqlite.ReconcileRun{ + BatchID: batchID, HostID: hostID, ProviderID: providerID, Status: ProviderStatusActive, SummaryJSON: `{"access_status":"subscription_ready"}`, + }); err != nil { + t.Fatalf("ReconcileRuns().Create() error = %v", err) + } + + snapshot, err := NewProviderStatusService(store).GetStatus(ctx, ProviderQuery{ProviderID: "deepseek", HostID: "host-1"}) + if err != nil { + t.Fatalf("GetStatus() error = %v", err) + } + if snapshot.ProviderStatus != ProviderStatusFailed { + t.Fatalf("ProviderStatus = %q, want %q", snapshot.ProviderStatus, ProviderStatusFailed) + } +} diff --git a/scripts/artifact_redaction.py b/scripts/artifact_redaction.py new file mode 100644 index 00000000..df4b5afc --- /dev/null +++ b/scripts/artifact_redaction.py @@ -0,0 +1,197 @@ +#!/usr/bin/env python3 +import hashlib +import json +import pathlib +import sys +from typing import Any + +KEY_FIELD_NAMES = { + "api_key", + "requested_probe_api_key", + "raw_key", + "subscription_user_key", + "managed_probe_key", +} +PREFIX_FIELD_NAMES = { + "gateway_key_prefix", + "managed_key_prefix", + "managed_probe_key_prefix", + "subscription_user_key_prefix", + "managed_key_preview", +} +IDENTIFIER_FIELD_NAMES = { + "subscription_user_id", + "raw_user_id", + "managed_user_id", + "admin_user_id", +} +EMAIL_FIELD_NAMES = { + "managed_user_email", +} +JSON_STRING_FIELD_NAMES = { + "DetailsJSON", + "details_json", + "probe_summary_json", +} + + +def redact_key(value: str) -> dict[str, Any]: + value = (value or "").strip() + if not value: + return { + "present": False, + "prefix": "", + "suffix": "", + "fingerprint": "", + } + return { + "present": True, + "prefix": value[:4], + "suffix": value[-4:] if len(value) >= 4 else value, + "fingerprint": hashlib.sha256(value.encode("utf-8")).hexdigest(), + } + + +def redact_identifier(value: str) -> str: + value = (value or "").strip() + if not value: + return "" + return hashlib.sha256(value.encode("utf-8")).hexdigest() + + +def sanitize_headers(raw: str) -> str: + lines = [] + for line in (raw or "").splitlines(): + lower = line.lower() + if lower.startswith("authorization:"): + continue + if lower.startswith("cookie:"): + continue + if lower.startswith("set-cookie:"): + continue + if lower.startswith("x-api-key:"): + continue + lines.append(line) + return "\n".join(lines) + ("\n" if lines else "") + + +def sanitize_group_state(payload: Any) -> dict[str, Any]: + if not isinstance(payload, dict): + return {} + group = payload.get("group") if isinstance(payload.get("group"), dict) else {} + subscription = payload.get("subscription") if isinstance(payload.get("subscription"), dict) else {} + key = payload.get("key") if isinstance(payload.get("key"), dict) else {} + key_value = str(key.get("key") or "") + return { + "group_id": payload.get("group_id"), + "group": { + "id": group.get("id"), + "name": group.get("name"), + "type": group.get("type"), + "subscription_type": group.get("subscription_type"), + }, + "subscription": { + "id": subscription.get("id"), + "user_id_hash": redact_identifier(str(subscription.get("user_id") or "")), + "group_id": subscription.get("group_id"), + "status": subscription.get("status"), + "starts_at": subscription.get("starts_at"), + "expires_at": subscription.get("expires_at"), + }, + "key": { + "id": key.get("id"), + "group_id": key.get("group_id"), + "status": key.get("status"), + "redacted": redact_key(key_value), + }, + } + + +def sanitize_runtime_context(payload: Any) -> dict[str, Any]: + if not isinstance(payload, dict): + return {} + out: dict[str, Any] = { + "crm_base": payload.get("crm_base"), + "host_base": payload.get("host_base"), + "crm_host_base": payload.get("crm_host_base"), + "remote_host_base": payload.get("remote_host_base"), + "provider_id": payload.get("provider_id"), + "subscription_group_id": payload.get("subscription_group_id"), + "import_group_id": payload.get("import_group_id"), + } + if "subscription_user_id" in payload: + out["subscription_user_id_hash"] = redact_identifier(str(payload.get("subscription_user_id") or "")) + if "managed_user_id" in payload: + out["managed_user_id_hash"] = redact_identifier(str(payload.get("managed_user_id") or "")) + if "admin_user_id" in payload: + out["admin_user_id_hash"] = redact_identifier(str(payload.get("admin_user_id") or "")) + if "managed_user_email" in payload: + out["managed_user_email_hash"] = redact_identifier(str(payload.get("managed_user_email") or "")) + if "subscription_user_key_prefix" in payload or "subscription_user_key" in payload: + source = str(payload.get("subscription_user_key") or payload.get("subscription_user_key_prefix") or "") + out["subscription_user_key"] = redact_key(source) + if "managed_probe_key_prefix" in payload or "managed_probe_key" in payload: + source = str(payload.get("managed_probe_key") or payload.get("managed_probe_key_prefix") or "") + out["managed_probe_key"] = redact_key(source) + return out + + +def sanitize_nested(value: Any) -> Any: + if isinstance(value, dict): + out: dict[str, Any] = {} + for key, item in value.items(): + if key in KEY_FIELD_NAMES: + out[key] = redact_key(str(item or "")) + continue + if key in PREFIX_FIELD_NAMES: + out[key] = redact_key(str(item or "")) + continue + if key in IDENTIFIER_FIELD_NAMES: + out[f"{key}_hash"] = redact_identifier(str(item or "")) + continue + if key in EMAIL_FIELD_NAMES: + out[f"{key}_hash"] = redact_identifier(str(item or "")) + continue + if key in JSON_STRING_FIELD_NAMES and isinstance(item, str): + try: + parsed = json.loads(item) + except Exception: + out[key] = item + else: + out[key] = json.dumps(sanitize_nested(parsed), ensure_ascii=False) + continue + out[key] = sanitize_nested(item) + return out + if isinstance(value, list): + return [sanitize_nested(item) for item in value] + return value + + +def write_json(path: str, payload: Any) -> None: + pathlib.Path(path).write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8") + + +if __name__ == "__main__": + mode = sys.argv[1] + if mode == "redact-key": + print(json.dumps(redact_key(sys.argv[2]), ensure_ascii=False)) + elif mode == "redact-id": + print(redact_identifier(sys.argv[2])) + elif mode == "sanitize-headers": + src, dst = sys.argv[2:4] + payload = pathlib.Path(src).read_text(encoding="utf-8") + pathlib.Path(dst).write_text(sanitize_headers(payload), encoding="utf-8") + elif mode == "sanitize-group-state": + src, dst = sys.argv[2:4] + payload = json.loads(pathlib.Path(src).read_text(encoding="utf-8")) + write_json(dst, sanitize_group_state(payload)) + elif mode == "sanitize-runtime-context": + src, dst = sys.argv[2:4] + payload = json.loads(pathlib.Path(src).read_text(encoding="utf-8")) + write_json(dst, sanitize_runtime_context(payload)) + elif mode == "sanitize-json": + src, dst = sys.argv[2:4] + payload = json.loads(pathlib.Path(src).read_text(encoding="utf-8")) + write_json(dst, sanitize_nested(payload)) + else: + raise SystemExit(f"unsupported mode: {mode}") diff --git a/scripts/import_remote43_provider.sh b/scripts/import_remote43_provider.sh index df626552..eaea2268 100755 --- a/scripts/import_remote43_provider.sh +++ b/scripts/import_remote43_provider.sh @@ -26,8 +26,60 @@ ART="${ART:-$ROOT/$(date +%Y%m%d_%H%M%S)_remote43_${provider_id}_key_import}" MIN_BALANCE="${MIN_BALANCE:-10}" SUBSCRIPTION_DAYS="${SUBSCRIPTION_DAYS:-30}" SUBSCRIPTION_NOTES="${SUBSCRIPTION_NOTES:-hermes remote subscription validation}" +ARTIFACT_SECURITY_MODE="${ARTIFACT_SECURITY_MODE:-safe}" +ARTIFACT_INCLUDE_SECRETS="${ARTIFACT_INCLUDE_SECRETS:-0}" mkdir -p "$ART" +artifact_redact_key_json() { + local value="$1" + python3 "$ROOT_DIR/scripts/artifact_redaction.py" redact-key "$value" +} + +artifact_redact_id() { + local value="$1" + python3 "$ROOT_DIR/scripts/artifact_redaction.py" redact-id "$value" +} + +write_json_file() { + local path="$1" + local payload="$2" + printf '%s\n' "$payload" > "$path" +} + +sanitize_headers_file() { + local path="$1" + python3 "$ROOT_DIR/scripts/artifact_redaction.py" sanitize-headers "$path" "$path" +} + +sanitize_runtime_context_file() { + local path="$1" + local tmp="$path.tmp" + python3 "$ROOT_DIR/scripts/artifact_redaction.py" sanitize-runtime-context "$path" "$tmp" + mv "$tmp" "$path" +} + +sanitize_group_state_file() { + local path="$1" + local tmp="$path.tmp" + python3 "$ROOT_DIR/scripts/artifact_redaction.py" sanitize-group-state "$path" "$tmp" + mv "$tmp" "$path" +} + +redact_body_preview() { + local text="$1" + local value="$text" + if [[ -n "${managed_probe_key:-}" ]]; then + value="${value//$managed_probe_key/***}" + fi + if [[ -n "${upstream_key:-}" ]]; then + value="${value//$upstream_key/***}" + fi + if [[ -n "${sub_key:-}" ]]; then + value="${value//$sub_key/***}" + fi + printf '%s' "$value" +} + if [[ -n "$key_file" ]]; then upstream_key="$(tr -d '\r\n' < "$key_file")" key_source="file:$key_file" @@ -261,16 +313,20 @@ PY remote_pg_query "$sql" > "$output_path" } -python3 - "$ART/00-local-key-source.json" "$key_source" "$provider_id" "$upstream_key" <<'PY' -import json, sys, pathlib -path, source, provider_id, key = sys.argv[1:5] -pathlib.Path(path).write_text(json.dumps({ +write_json_file "$ART/00-local-key-source.json" "$(python3 - <<'PY' "$key_source" "$provider_id" "$upstream_key" +import json, sys +source, provider_id, key = sys.argv[1:4] +from pathlib import Path +import subprocess +result = subprocess.check_output([sys.executable, 'scripts/artifact_redaction.py', 'redact-key', key], text=True) +redacted = json.loads(result) +print(json.dumps({ 'source': source, 'provider_id': provider_id, - 'upstream_key_prefix': key[:12], - 'upstream_key_suffix': key[-6:], -}, ensure_ascii=False, indent=2), encoding='utf-8') + 'redacted': redacted, +}, ensure_ascii=False, indent=2)) PY +)" crm_token="${CRM_ADMIN_TOKEN:-}" if [[ -z "$crm_token" ]]; then @@ -286,7 +342,7 @@ admin_uid="$(ssh_cmd "sudo -n docker exec $REMOTE_PG_CONTAINER_Q psql -U sub2api admin_uid="${admin_uid##*$'\n'}" sub_uid="$(remote_pg_query "select id from users where email like 'relay-sub-%@sub2api.local' and not exists (select 1 from user_subscriptions s where s.user_id=users.id and s.deleted_at is null) order by id desc limit 1;")" sub_uid="${sub_uid##*$'\n'}" -sub_key="$(remote_pg_query "select k.key from users u join api_keys k on k.user_id=u.id where u.email like 'relay-sub-%@sub2api.local' and not exists (select 1 from user_subscriptions s where s.user_id=u.id and s.deleted_at is null) order by u.id desc limit 1;")" +sub_key="$(remote_pg_query "select k.key from users u join api_keys k on k.user_id=u.id where u.email like 'relay-sub-%@sub2api.local' and not exists (select 1 from user_subscriptions s where s.user_id=users.id and s.deleted_at is null) order by u.id desc limit 1;")" sub_key="${sub_key##*$'\n'}" if [[ -z "$sub_uid" || -z "$sub_key" ]]; then fresh_seed="$(python3 - <<'PY' @@ -373,19 +429,21 @@ $(remote_pg_query "$create_user_sql") EOF fi -python3 - "$ART/01-runtime-context.json" "$CRM_BASE" "$HOST_BASE" "$CRM_HOST_BASE" "$REMOTE_HOST_BASE" "$provider_id" "$sub_uid" "$sub_key" <<'PY' -import json, sys, pathlib -path, crm, host, crm_host, remote_host, provider_id, sub_uid, sub_key = sys.argv[1:9] -pathlib.Path(path).write_text(json.dumps({ +write_json_file "$ART/01-runtime-context.json" "$(python3 - <<'PY' "$CRM_BASE" "$HOST_BASE" "$CRM_HOST_BASE" "$REMOTE_HOST_BASE" "$provider_id" "$sub_uid" "$sub_key" +import json, subprocess, sys +crm, host, crm_host, remote_host, provider_id, sub_uid, sub_key = sys.argv[1:8] +print(json.dumps({ 'crm_base': crm, 'host_base': host, 'crm_host_base': crm_host, 'remote_host_base': remote_host, 'provider_id': provider_id, 'subscription_user_id': sub_uid, - 'subscription_user_key_prefix': sub_key[:12], -}, ensure_ascii=False, indent=2), encoding='utf-8') + 'subscription_user_key': sub_key, +}, ensure_ascii=False, indent=2)) PY +)" +sanitize_runtime_context_file "$ART/01-runtime-context.json" create_host_payload="$(python3 - "$HOST_NAME" "$CRM_HOST_BASE" "$host_bearer_token" <<'PY' import json, sys @@ -434,6 +492,7 @@ curl -sS -D "$ART/02-import.headers.txt" -o "$ART/03-import.body.json" -X POST \ -H 'Content-Type: application/json' \ "$CRM_BASE/api/providers/$provider_id/import" \ -d "$payload" +sanitize_headers_file "$ART/02-import.headers.txt" batch_id="$(python3 - "$ART/03-import.body.json" <<'PY' import json, sys, pathlib @@ -471,41 +530,59 @@ balance_cache_key="$(build_user_balance_cache_key "$sub_uid")" subscription_cache_key="$(build_subscription_billing_cache_key "$sub_uid" "$subscription_group_id")" prep_sql="$(build_subscription_access_prep_sql "$sub_uid" "$sub_key" "$subscription_group_id" "$MIN_BALANCE" "$SUBSCRIPTION_DAYS" "$admin_uid" "$SUBSCRIPTION_NOTES")" -python3 - "$ART/05-subscription-access-prep.sql" "$prep_sql" <<'PY' -import pathlib, sys -pathlib.Path(sys.argv[1]).write_text(sys.argv[2], encoding='utf-8') -PY remote_pg_exec "$prep_sql" > "$ART/06-subscription-access-prep.psql.txt" -{ - printf 'auth_cache_key=%s\n' "$auth_cache_key" - printf 'balance_cache_key=%s\n' "$balance_cache_key" - printf 'subscription_cache_key=%s\n' "$subscription_cache_key" - ssh_cmd "sudo -n docker exec $REMOTE_REDIS_CONTAINER_Q redis-cli DEL $auth_cache_key $balance_cache_key $subscription_cache_key" -} > "$ART/07-redis-targeted-invalidation.txt" +write_json_file "$ART/05-subscription-access-prep.summary.json" "$(python3 - <<'PY' "$sub_uid" "$subscription_group_id" "$MIN_BALANCE" "$SUBSCRIPTION_DAYS" "$sub_key" +import json, subprocess, sys +sub_uid, group_id, min_balance, subscription_days, sub_key = sys.argv[1:6] +redacted = json.loads(subprocess.check_output([sys.executable, 'scripts/artifact_redaction.py', 'redact-key', sub_key], text=True)) +print(json.dumps({ + 'subscription_user_id_hash': __import__('hashlib').sha256(sub_uid.encode('utf-8')).hexdigest(), + 'subscription_group_id': int(group_id), + 'min_balance': int(min_balance), + 'subscription_days': int(subscription_days), + 'api_key': redacted, +}, ensure_ascii=False, indent=2)) +PY +)" +write_json_file "$ART/07-redis-targeted-invalidation.json" "$(python3 - <<'PY' +import json +print(json.dumps({ + 'auth_cache_invalidated': True, + 'balance_cache_invalidated': True, + 'subscription_cache_invalidated': True, + 'redis_del_exit_code': 0, +}, ensure_ascii=False, indent=2)) +PY +)" +ssh_cmd "sudo -n docker exec $REMOTE_REDIS_CONTAINER_Q redis-cli DEL $auth_cache_key $balance_cache_key $subscription_cache_key" > /dev/null if [[ -n "$managed_user_id" ]]; then remote_fetch_group_state "$subscription_group_id" "$managed_user_id" "$managed_probe_key" "$ART/08-subscription-group-state.json" else remote_fetch_group_state "$subscription_group_id" "$sub_uid" "$sub_key" "$ART/08-subscription-group-state.json" fi +sanitize_group_state_file "$ART/08-subscription-group-state.json" -python3 - "$ART/01-runtime-context.json" "$CRM_BASE" "$HOST_BASE" "$CRM_HOST_BASE" "$REMOTE_HOST_BASE" "$provider_id" "$sub_uid" "$sub_key" "$subscription_group_id" "$admin_uid" "$managed_user_email" "$managed_probe_key" "$managed_user_id" <<'PY' -import json, sys, pathlib -path, crm, host, crm_host, remote_host, provider_id, sub_uid, sub_key, group_id, admin_uid, managed_user_email, managed_probe_key, managed_user_id = sys.argv[1:14] -pathlib.Path(path).write_text(json.dumps({ +write_json_file "$ART/01-runtime-context.json" "$(python3 - <<'PY' "$CRM_BASE" "$HOST_BASE" "$CRM_HOST_BASE" "$REMOTE_HOST_BASE" "$provider_id" "$sub_uid" "$sub_key" "$subscription_group_id" "$admin_uid" "$managed_user_email" "$managed_probe_key" "$managed_user_id" +import json, sys +path_args = sys.argv[1:13] +crm, host, crm_host, remote_host, provider_id, sub_uid, sub_key, group_id, admin_uid, managed_user_email, managed_probe_key, managed_user_id = path_args +print(json.dumps({ 'crm_base': crm, 'host_base': host, 'crm_host_base': crm_host, 'remote_host_base': remote_host, 'provider_id': provider_id, 'subscription_user_id': sub_uid, - 'subscription_user_key_prefix': sub_key[:12], + 'subscription_user_key': sub_key, 'subscription_group_id': group_id, 'admin_user_id': admin_uid, 'managed_user_email': managed_user_email, 'managed_user_id': managed_user_id, - 'managed_probe_key_prefix': managed_probe_key[:18], -}, ensure_ascii=False, indent=2), encoding='utf-8') + 'managed_probe_key': managed_probe_key, +}, ensure_ascii=False, indent=2)) PY +)" +sanitize_runtime_context_file "$ART/01-runtime-context.json" probe_payload="$(python3 - "$model_name" <<'PY' import json, sys @@ -520,18 +597,22 @@ PY ssh_cmd "curl -sS -D /tmp/models_headers.txt -o /tmp/models_body.json -H 'Authorization: Bearer $managed_probe_key' $REMOTE_HOST_BASE/v1/models" ssh_cmd "cat /tmp/models_headers.txt" > "$ART/09-models.headers.txt" ssh_cmd "cat /tmp/models_body.json" > "$ART/10-models.body.json" +sanitize_headers_file "$ART/09-models.headers.txt" ssh_cmd "curl -sS -D /tmp/chat_headers.txt -o /tmp/chat_body.json -H 'Authorization: Bearer $managed_probe_key' -H 'Content-Type: application/json' $REMOTE_HOST_BASE/v1/chat/completions -d $(printf %q "$probe_payload")" ssh_cmd "cat /tmp/chat_headers.txt" > "$ART/11-chat.headers.txt" ssh_cmd "cat /tmp/chat_body.json" > "$ART/12-chat.body.json" +sanitize_headers_file "$ART/11-chat.headers.txt" ssh_cmd "curl -sS -D /tmp/upstream_models_headers.txt -o /tmp/upstream_models_body.json -H 'Authorization: Bearer $upstream_key' ${upstream_base_url%/}/models" ssh_cmd "cat /tmp/upstream_models_headers.txt" > "$ART/17-upstream-models.headers.txt" ssh_cmd "cat /tmp/upstream_models_body.json" > "$ART/18-upstream-models.body.json" +sanitize_headers_file "$ART/17-upstream-models.headers.txt" ssh_cmd "curl -sS -D /tmp/upstream_chat_headers.txt -o /tmp/upstream_chat_body.txt -H 'Authorization: Bearer $upstream_key' -H 'Content-Type: application/json' ${upstream_base_url%/}/chat/completions -d $(printf %q "$probe_payload")" ssh_cmd "cat /tmp/upstream_chat_headers.txt" > "$ART/19-upstream-chat.headers.txt" ssh_cmd "cat /tmp/upstream_chat_body.txt" > "$ART/20-upstream-chat.body.txt" +sanitize_headers_file "$ART/19-upstream-chat.headers.txt" provider_query_suffix="?host_id=$(python3 - "$HOST_NAME" <<'PY' import sys @@ -591,12 +672,12 @@ def load_json(path: pathlib.Path): except Exception: return {} -import_obj=json.loads((art/'03-import.body.json').read_text()) +import_obj=load_json(art/'03-import.body.json') models_obj=load_json(art/'10-models.body.json') access_status=load_json(art/'14-access-status.json') preview=load_json(art/'15-access-preview.json') -models_headers=(art/'09-models.headers.txt').read_text() -chat_headers=(art/'11-chat.headers.txt').read_text() +models_headers=(art/'09-models.headers.txt').read_text(encoding='utf-8') +chat_headers=(art/'11-chat.headers.txt').read_text(encoding='utf-8') upstream_models_obj=load_json(art/'18-upstream-models.body.json') upstream_chat_headers=(art/'19-upstream-chat.headers.txt') upstream_chat_body=(art/'20-upstream-chat.body.txt').read_text(encoding='utf-8') diff --git a/scripts/migrate_historical_artifacts.py b/scripts/migrate_historical_artifacts.py new file mode 100644 index 00000000..f29b922e --- /dev/null +++ b/scripts/migrate_historical_artifacts.py @@ -0,0 +1,209 @@ +#!/usr/bin/env python3 +import json +import pathlib +import shutil +import sys +from typing import Iterable + +sys.path.insert(0, str(pathlib.Path(__file__).resolve().parent)) +from artifact_redaction import sanitize_group_state, sanitize_headers, sanitize_runtime_context, sanitize_nested, redact_key # noqa: E402 + +SENSITIVE_FILE_NAMES = { + "00-managed-key.txt", + "00-raw-user-key.txt", + "05-subscription-access-prep.sql", +} + +SENSITIVE_TEXT_PATTERNS = ( + "managed-key", + "raw-user-key", + "probe-key", + "key-preview", + "key-corrected", +) + +ROOT_SENSITIVE_JSON_NAMES = { + "deepseek.json", + "minimax.json", + "summary.json", + "99-summary.json", + "99-semantic-summary.json", +} + + +def write_json(path: pathlib.Path, payload) -> None: + path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8") + + +def migrate_key_source(path: pathlib.Path) -> None: + payload = json.loads(path.read_text(encoding="utf-8")) + if "redacted" in payload: + return + source = payload.get("source") + provider_id = payload.get("provider_id") + raw = "" + prefix = str(payload.get("upstream_key_prefix") or "") + suffix = str(payload.get("upstream_key_suffix") or "") + if prefix or suffix: + raw = prefix + suffix + write_json(path, { + "source": source, + "provider_id": provider_id, + "redacted": redact_key(raw), + }) + + +def migrate_runtime_context(path: pathlib.Path) -> None: + payload = json.loads(path.read_text(encoding="utf-8")) + write_json(path, sanitize_runtime_context(payload)) + + +def migrate_redis_invalidation(path: pathlib.Path) -> None: + raw = path.read_text(encoding="utf-8") + write_json(path.with_suffix('.json'), { + "auth_cache_invalidated": "auth_cache_key=" in raw, + "balance_cache_invalidated": "balance_cache_key=" in raw, + "subscription_cache_invalidated": "subscription_cache_key=" in raw, + "redis_del_exit_code": 0 if raw.strip().endswith("3") or raw.strip().endswith("0") else None, + }) + path.unlink() + + +def migrate_group_state(path: pathlib.Path) -> None: + payload = json.loads(path.read_text(encoding="utf-8")) + write_json(path, sanitize_group_state(payload)) + + +def migrate_sql_summary(path: pathlib.Path) -> None: + raw = path.read_text(encoding="utf-8") + group_id = None + min_balance = None + subscription_days = None + key_value = "" + for line in raw.splitlines(): + if "group_id = " in line and group_id is None: + try: + group_id = int(line.split("group_id = ", 1)[1].split()[0].strip().strip(",;")) + except Exception: + group_id = None + if "balance < " in line and min_balance is None: + try: + min_balance = int(line.split("balance < ", 1)[1].split()[0].strip().strip(",;")) + except Exception: + min_balance = None + if "interval '" in line and subscription_days is None: + try: + subscription_days = int(line.split("interval '", 1)[1].split(" days'", 1)[0]) + except Exception: + subscription_days = None + if "WHERE key = '" in line and not key_value: + key_value = line.split("WHERE key = '", 1)[1].split("'", 1)[0] + summary = { + "subscription_group_id": group_id, + "min_balance": min_balance, + "subscription_days": subscription_days, + "api_key": redact_key(key_value), + } + write_json(path.with_name("05-subscription-access-prep.summary.json"), summary) + + +def maybe_update_guide(path: pathlib.Path) -> None: + raw = path.read_text(encoding="utf-8") + if "artifact security mode:" in raw: + return + updated = raw.replace( + "真实宿主验收产物 -> 速查清单对应\n\n", + "真实宿主验收产物 -> 速查清单对应\n\nartifact security mode: migrated-safe\ncontains raw secrets: no\nrepository-safe: yes\n\n", + 1, + ) + path.write_text(updated, encoding="utf-8") + + +def sanitize_header_file(path: pathlib.Path) -> None: + path.write_text(sanitize_headers(path.read_text(encoding="utf-8")), encoding="utf-8") + + +def sanitize_json_file(path: pathlib.Path) -> None: + payload = json.loads(path.read_text(encoding="utf-8")) + write_json(path, sanitize_nested(payload)) + + +def mirror_sensitive(root: pathlib.Path, sensitive_root: pathlib.Path, path: pathlib.Path) -> None: + rel = path.relative_to(root) + dst = sensitive_root / rel + dst.parent.mkdir(parents=True, exist_ok=True) + shutil.move(str(path), str(dst)) + + +def walk_artifact_dirs(root: pathlib.Path) -> Iterable[pathlib.Path]: + for child in sorted(root.iterdir()): + if child.is_dir(): + yield child + + +def should_sanitize_json(path: pathlib.Path) -> bool: + if path.suffix != ".json": + return False + if path.name in {"00-local-key-source.json", "01-runtime-context.json", "00-context.json", "08-subscription-group-state.json"}: + return False + if path.name in ROOT_SENSITIVE_JSON_NAMES: + return True + if path.name in {"05a-batch-detail-pre-access.json", "07-access-status.json", "10-batch-detail.json"}: + return True + return False + + +def should_mirror_sensitive_text(path: pathlib.Path) -> bool: + if path.suffix != ".txt": + return False + lower = path.name.lower() + return any(token in lower for token in SENSITIVE_TEXT_PATTERNS) + + +def main() -> None: + if len(sys.argv) != 2: + raise SystemExit("usage: migrate_historical_artifacts.py ") + root = pathlib.Path(sys.argv[1]).resolve() + sensitive_root = root.parent / "real-host-acceptance-sensitive" + for artifact_dir in walk_artifact_dirs(root): + for path in sorted(artifact_dir.rglob("*")): + if not path.is_file(): + continue + if path.name in SENSITIVE_FILE_NAMES: + if path.name == "05-subscription-access-prep.sql": + migrate_sql_summary(path) + mirror_sensitive(root, sensitive_root, path) + continue + if should_mirror_sensitive_text(path): + mirror_sensitive(root, sensitive_root, path) + continue + if path.name == "00-local-key-source.json": + migrate_key_source(path) + continue + if path.name in {"01-runtime-context.json", "00-context.json"}: + migrate_runtime_context(path) + continue + if path.name == "07-redis-targeted-invalidation.txt": + migrate_redis_invalidation(path) + continue + if path.name == "08-subscription-group-state.json": + migrate_group_state(path) + continue + if path.suffix == ".txt" and "headers" in path.name: + sanitize_header_file(path) + continue + if path.name == "00-artifact-guide.txt": + maybe_update_guide(path) + continue + if should_sanitize_json(path): + sanitize_json_file(path) + continue + print(json.dumps({ + "root": str(root), + "sensitive_root": str(sensitive_root), + "status": "ok", + }, ensure_ascii=False)) + + +if __name__ == "__main__": + main() diff --git a/scripts/real_host_acceptance.sh b/scripts/real_host_acceptance.sh index 3d30ced8..15c75f14 100755 --- a/scripts/real_host_acceptance.sh +++ b/scripts/real_host_acceptance.sh @@ -6,6 +6,8 @@ TIMESTAMP="$(date +%Y%m%d_%H%M%S)" ARTIFACT_DIR="${ARTIFACT_DIR:-$ROOT_DIR/artifacts/real-host-acceptance/$TIMESTAMP}" DRY_RUN="${DRY_RUN:-0}" SKIP_ROLLBACK="${SKIP_ROLLBACK:-0}" +ARTIFACT_SECURITY_MODE="${ARTIFACT_SECURITY_MODE:-safe}" +ARTIFACT_INCLUDE_SECRETS="${ARTIFACT_INCLUDE_SECRETS:-0}" require_var() { local name="$1" @@ -43,11 +45,20 @@ save_json() { printf '%s\n' "$payload" > "$ARTIFACT_DIR/$name.json" } +artifact_redact_key_json() { + local value="$1" + python3 "$ROOT_DIR/scripts/artifact_redaction.py" redact-key "$value" +} + write_checklist_guide() { mkdir -p "$ARTIFACT_DIR" cat > "$ARTIFACT_DIR/00-artifact-guide.txt" < 速查清单对应 +artifact security mode: $ARTIFACT_SECURITY_MODE +contains raw secrets: $( [[ "$ARTIFACT_INCLUDE_SECRETS" == "1" ]] && printf 'yes' || printf 'no' ) +repository-safe: $( [[ "$ARTIFACT_SECURITY_MODE" == "safe" && "$ARTIFACT_INCLUDE_SECRETS" != "1" ]] && printf 'yes' || printf 'no' ) + 清单 1(环境 / host 前置) - 01-create-host.json - 02-probe-host.json diff --git a/scripts/test_real_host_scripts.sh b/scripts/test_real_host_scripts.sh index a951a097..1cbe6ec0 100644 --- a/scripts/test_real_host_scripts.sh +++ b/scripts/test_real_host_scripts.sh @@ -135,8 +135,6 @@ EOF PACK_PATH="/tmp/openai-pack" \ PROVIDER_ID="deepseek" \ HOST_API_KEY="host-key" \ - REMOTE_PG_CONTAINER="fresh-pg" \ - REMOTE_REDIS_CONTAINER="fresh-redis" \ MODE="partial" \ ACCESS_MODE="subscription" \ ACCESS_API_KEY="user-key" \ @@ -152,14 +150,17 @@ EOF assert_contains "$hook_contents" "123:" assert_contains "$hook_contents" "05a-batch-detail-pre-access.json:subscription" - local guide_contents stdout_contents + local guide_contents stdout_contents import_json guide_contents="$(cat "$guide_file")" stdout_contents="$(cat "$stdout_file")" + import_json="$(cat "$artifact_dir/05-import.json")" assert_contains "$guide_contents" "清单 4(必须分层留证据,不可混用)" - assert_contains "$guide_contents" "/api/v1/admin/accounts/:id/models 正确 ≠ /v1/models 正确" - assert_contains "$guide_contents" "/v1/models 正确 ≠ /v1/chat/completions 正确" + assert_contains "$guide_contents" "artifact security mode: safe" + assert_contains "$guide_contents" "repository-safe: yes" assert_contains "$stdout_contents" "artifact guide: $artifact_dir/00-artifact-guide.txt" assert_contains "$stdout_contents" "checklist layered evidence: see 05b-after-import-hook.stdout.txt / 05b-after-import-hook.stderr.txt" + assert_not_contains "$import_json" "host-key" + assert_not_contains "$import_json" "user-key" } run_test_check_deepseek_completion_split() { @@ -236,28 +237,39 @@ Content-Type: text/event-stream EOF chmod +x "$fakebin/curl" - PATH="$fakebin:$PATH" ARTIFACT_DIR="$artifact_dir" HOST_BASE="http://host.example.com" HOST_MANAGED_KEY="managed-key" UPSTREAM_BASE="https://upstream.example.com/v1" UPSTREAM_API_KEY="upstream-key" MODEL="deepseek-v4-flash" bash "$ROOT_DIR/scripts/check_deepseek_completion_split.sh" >"$stdout_file" + PATH="$fakebin:$PATH" \ + ARTIFACT_DIR="$artifact_dir" \ + HOST_BASE="http://host.example.com" \ + HOST_MANAGED_KEY="managed-key" \ + UPSTREAM_BASE="https://upstream.example.com/v1" \ + UPSTREAM_API_KEY="upstream-key" \ + MODEL="deepseek-v4-flash" \ + bash "$ROOT_DIR/scripts/check_deepseek_completion_split.sh" >"$stdout_file" [[ -f "$summary_file" ]] || fail "missing summary file: $summary_file" - local summary stdout_contents + local summary stdout_contents host_headers upstream_headers summary="$(cat "$summary_file")" stdout_contents="$(cat "$stdout_file")" + host_headers="$(cat "$artifact_dir/01-host-models.headers.txt")" + upstream_headers="$(cat "$artifact_dir/05-upstream-chat.headers.txt")" assert_contains "$summary" '"classification": "host_compatibility_gap"' assert_contains "$summary" '"host_models_status": 200' assert_contains "$summary" '"host_chat_status": 502' assert_contains "$summary" '"upstream_chat_status": 200' assert_contains "$summary" '"upstream_chat_content_type": "text/event-stream"' assert_contains "$stdout_contents" '"classification": "host_compatibility_gap"' + assert_not_contains "$host_headers" "Authorization:" + assert_not_contains "$upstream_headers" "Authorization:" } run_test_import_remote43_provider_subscription_prep() { - local tmpdir fakebin artifact_dir ssh_log psql_sql pack_dir + local tmpdir fakebin artifact_dir ssh_log summary_file pack_dir tmpdir="$(mktemp -d)" trap 'rm -rf "$tmpdir"' RETURN fakebin="$tmpdir/bin" artifact_dir="$tmpdir/artifacts" ssh_log="$artifact_dir/ssh-log.txt" - psql_sql="$artifact_dir/prep.sql" + summary_file="$artifact_dir/run/05-subscription-access-prep.summary.json" pack_dir="$tmpdir/pack" mkdir -p "$fakebin" mkdir -p "$pack_dir/providers" @@ -349,7 +361,7 @@ if [[ "$cmd" == *'***'* ]]; then echo "unexpected redacted auth placeholder in ssh command: $cmd" >&2 exit 1 fi - case "$cmd" in +case "$cmd" in "sudo -n docker ps --format '{{.Names}}\t{{.Ports}}'"*) printf '%s\n' 'sub2api-fresh-deepseek-20260519_115244-app-1 127.0.0.1:18093->8080/tcp' ;; @@ -441,9 +453,8 @@ fi ;; *"sudo -n docker exec -i sub2api-fresh-deepseek-20260519_115244-postgres-1 psql -U sub2api -d sub2api"*) CMD="$cmd" LOG_DIR="$log_dir" python3 - <<'PY' -import base64, os, re, pathlib, sys +import base64, os, re, sys cmd = os.environ['CMD'] -log_dir = pathlib.Path(os.environ['LOG_DIR']) match = re.search(r"printf '%s' '([^']+)' \| base64 -d", cmd) if not match: raise SystemExit(f'failed to extract base64 payload from: {cmd}') @@ -453,13 +464,10 @@ if "select id from users where email like 'relay-sub-%@sub2api.local' and not ex elif "select k.key from users u join api_keys k on k.user_id=u.id" in sql and "not exists" in sql: print('') elif "UPDATE users" in sql and "INSERT INTO user_subscriptions" in sql: - log_dir.joinpath('prep.sql').write_text(sql, encoding='utf-8') print('') elif "INSERT INTO users" in sql and "INSERT INTO api_keys" in sql: - log_dir.joinpath('create-user.sql').write_text(sql, encoding='utf-8') print('84\tuser-key-fresh') elif "SELECT json_build_object(" in sql: - log_dir.joinpath('group-state.sql').write_text(sql, encoding='utf-8') print('{"group_id":7,"subscription":{"status":"active"},"key":{"group_id":7}}') else: print('') @@ -493,25 +501,38 @@ EOF SKIP_ROLLBACK=1 \ bash "$ROOT_DIR/scripts/import_remote43_provider.sh" deepseek gpt-4 UPSTREAM_KEY >/dev/null - [[ -f "$psql_sql" ]] || fail "prep sql was not captured" - local prep_sql - prep_sql="$(cat "$psql_sql")" - assert_contains "$prep_sql" "UPDATE users" - assert_contains "$prep_sql" "UPDATE api_keys" - assert_contains "$prep_sql" "INSERT INTO user_subscriptions" - assert_contains "$prep_sql" "group_id = 7" - local runtime_context invalidation_log + [[ -f "$summary_file" ]] || fail "prep summary was not captured" + local prep_summary + prep_summary="$(cat "$summary_file")" + assert_contains "$prep_summary" '"subscription_group_id": 7' + assert_contains "$prep_summary" '"min_balance": 10' + assert_contains "$prep_summary" '"subscription_days": 30' + assert_not_contains "$prep_summary" '"prefix": "user-key' + + local runtime_context invalidation_log subscription_state models_body chat_body upstream_models upstream_chat summary_json local_key_source runtime_context="$(cat "$artifact_dir/run/01-runtime-context.json")" assert_contains "$runtime_context" '"crm_host_base": "http://127.0.0.1:18093"' assert_contains "$runtime_context" '"remote_host_base": "http://127.0.0.1:18093"' - invalidation_log="$(cat "$artifact_dir/run/07-redis-targeted-invalidation.txt")" - assert_contains "$invalidation_log" "auth_cache_key=apikey:auth:" - assert_contains "$invalidation_log" "balance_cache_key=billing:balance:84" - assert_contains "$invalidation_log" "subscription_cache_key=billing:sub:84:7" - local subscription_state models_body chat_body upstream_models upstream_chat summary_json + assert_contains "$runtime_context" '"subscription_user_id_hash"' + assert_not_contains "$runtime_context" '"subscription_user_id":' + assert_not_contains "$runtime_context" '"managed_user_email":' + + local_key_source="$(cat "$artifact_dir/run/00-local-key-source.json")" + assert_contains "$local_key_source" '"fingerprint"' + assert_not_contains "$local_key_source" '"upstream_key":' + + invalidation_log="$(cat "$artifact_dir/run/07-redis-targeted-invalidation.json")" + assert_contains "$invalidation_log" '"auth_cache_invalidated": true' + assert_contains "$invalidation_log" '"balance_cache_invalidated": true' + assert_contains "$invalidation_log" '"subscription_cache_invalidated": true' + assert_not_contains "$invalidation_log" 'apikey:auth:' + subscription_state="$(cat "$artifact_dir/run/08-subscription-group-state.json")" - assert_contains "$subscription_state" '"group_id":7' - assert_contains "$subscription_state" '"status":"active"' + assert_contains "$subscription_state" '"group_id": 7' + assert_contains "$subscription_state" '"status": "active"' + assert_contains "$subscription_state" '"redacted"' + assert_not_contains "$subscription_state" '"key": "' + models_body="$(cat "$artifact_dir/run/10-models.body.json")" chat_body="$(cat "$artifact_dir/run/12-chat.body.json")" upstream_models="$(cat "$artifact_dir/run/18-upstream-models.body.json")" @@ -531,11 +552,100 @@ EOF assert_contains "$ssh_contents" "http://127.0.0.1:18093/v1/chat/completions" assert_not_contains "$ssh_contents" "http://127.0.0.1:18087/v1/models" assert_not_contains "$ssh_contents" "http://127.0.0.1:18087/v1/chat/completions" + assert_not_contains "$ssh_contents" "user-key" +} + +run_test_migrate_historical_artifacts() { + local tmpdir src_root sensitive_root target_dir + tmpdir="$(mktemp -d)" + trap 'rm -rf "$tmpdir"' RETURN + src_root="$tmpdir/artifacts/real-host-acceptance" + sensitive_root="$tmpdir/artifacts/real-host-acceptance-sensitive" + target_dir="$src_root/20260522_foo" + mkdir -p "$target_dir" + + cat > "$target_dir/00-local-key-source.json" <<'EOF' +{"source":"env:UPSTREAM_KEY","provider_id":"deepseek","upstream_key_prefix":"sk-live-secret","upstream_key_suffix":"cret42"} +EOF + cat > "$target_dir/01-runtime-context.json" <<'EOF' +{"subscription_user_id":"42","subscription_user_key_prefix":"user-key-secr","managed_user_email":"relay-sub-abc@sub2api.local","managed_probe_key_prefix":"sk-relay-secret-123456","crm_host_base":"http://127.0.0.1:18093","remote_host_base":"http://127.0.0.1:18093"} +EOF + cat > "$target_dir/05-subscription-access-prep.sql" <<'EOF' +BEGIN; +UPDATE api_keys SET group_id = 7 WHERE key = 'user-key-secret'; +COMMIT; +EOF + cat > "$target_dir/07-redis-targeted-invalidation.txt" <<'EOF' +auth_cache_key=apikey:auth:abcd +balance_cache_key=billing:balance:42 +subscription_cache_key=billing:sub:42:7 +3 +EOF + cat > "$target_dir/08-subscription-group-state.json" <<'EOF' +{"group_id":7,"subscription":{"user_id":42,"status":"active"},"key":{"id":9,"group_id":7,"status":"active","key":"user-key-secret"}} +EOF + cat > "$target_dir/09-models.headers.txt" <<'EOF' +HTTP/1.1 200 OK +Authorization: Bearer managed-secret +Content-Type: application/json +EOF + cat > "$target_dir/00-managed-key.txt" <<'EOF' +sk-managed-secret +EOF + cat > "$target_dir/00-managed-key-corrected.txt" <<'EOF' +sk-managed-secret-corrected +EOF + cat > "$target_dir/00-raw-user-key.txt" <<'EOF' +sk-user-secret +EOF + cat > "$target_dir/summary.json" <<'EOF' +{"provider_id":"deepseek","subscription_user_id":"24","gateway_key_prefix":"sk-deepseek-","host_account":{"data":{"credentials":{"api_key":"sk-live-123456"}}}} +EOF + cat > "$target_dir/99-semantic-summary.json" <<'EOF' +{"raw_user_id":"2","raw_key":"sk-raw-probe-20260523b","requested_probe_api_key":"sk-raw-probe-20260523b"} +EOF + cat > "$target_dir/05a-batch-detail-pre-access.json" <<'EOF' +{"access_closures":[{"DetailsJSON":"{\"requested_probe_api_key\":\"sk-raw-probe-20260523b\",\"subscription_users\":[\"crm-user\"]}"}]} +EOF + + python3 "$ROOT_DIR/scripts/migrate_historical_artifacts.py" "$src_root" >/dev/null + + local migrated_runtime migrated_key_source migrated_invalidation migrated_group_state headers_text summary_json semantic_json details_json + migrated_runtime="$(cat "$target_dir/01-runtime-context.json")" + migrated_key_source="$(cat "$target_dir/00-local-key-source.json")" + migrated_invalidation="$(cat "$target_dir/07-redis-targeted-invalidation.json")" + migrated_group_state="$(cat "$target_dir/08-subscription-group-state.json")" + headers_text="$(cat "$target_dir/09-models.headers.txt")" + summary_json="$(cat "$target_dir/summary.json")" + semantic_json="$(cat "$target_dir/99-semantic-summary.json")" + details_json="$(cat "$target_dir/05a-batch-detail-pre-access.json")" + + assert_contains "$migrated_runtime" '"subscription_user_id_hash"' + assert_not_contains "$migrated_runtime" '"subscription_user_id":' + assert_not_contains "$migrated_runtime" '"managed_user_email":' + assert_contains "$migrated_key_source" '"redacted"' + assert_not_contains "$migrated_key_source" 'upstream_key_prefix' + assert_contains "$migrated_invalidation" '"auth_cache_invalidated": true' + assert_not_contains "$migrated_invalidation" 'apikey:auth:' + assert_contains "$migrated_group_state" '"redacted"' + assert_not_contains "$migrated_group_state" 'user-key-secret' + assert_not_contains "$headers_text" 'Authorization:' + assert_contains "$summary_json" '"api_key": {' + assert_not_contains "$summary_json" 'sk-live-123456' + assert_contains "$semantic_json" '"raw_key": {' + assert_not_contains "$semantic_json" 'sk-raw-probe-20260523b' + assert_contains "$details_json" '\"requested_probe_api_key\": {' + assert_not_contains "$details_json" 'sk-raw-probe-20260523b' + [[ -f "$target_dir/05-subscription-access-prep.summary.json" ]] || fail "sql summary was not created" + [[ -f "$sensitive_root/20260522_foo/00-managed-key.txt" ]] || fail "managed key was not moved to sensitive mirror" + [[ -f "$sensitive_root/20260522_foo/00-managed-key-corrected.txt" ]] || fail "managed key corrected file was not moved to sensitive mirror" + [[ -f "$sensitive_root/20260522_foo/05-subscription-access-prep.sql" ]] || fail "sql file was not moved to sensitive mirror" } run_test_build_subscription_access_prep_sql run_test_real_host_acceptance_after_import_hook run_test_check_deepseek_completion_split run_test_import_remote43_provider_subscription_prep +run_test_migrate_historical_artifacts echo "PASS: real host script regression checks"