diff --git a/.agent/skills/false-negative-status-triage/SKILL.md b/.agent/skills/false-negative-status-triage/SKILL.md new file mode 100644 index 00000000..df653f23 --- /dev/null +++ b/.agent/skills/false-negative-status-triage/SKILL.md @@ -0,0 +1,122 @@ +--- +name: false-negative-status-triage +description: Diagnose and fix false-negative status signals when control-plane status says something is degraded or broken but real user traffic works. Use this whenever provider status, account status, probe status, route health, or inventory state disagrees with real `/models`, `/chat/completions`, usage logs, or verified user flows. Also use it for Chinese requests such as “误报”, “false-negative”, “状态语义不一致”, “provider_status 不准”, “last_probe_status 错误”, or “真实数据面可用但后台还显示失败”. +--- + +# False-Negative Status Triage + +This skill is for signal reconciliation. + +Problem pattern: + +- real request path works +- status projection still says degraded, broken, or failed + +Treat this as a modeling problem first, not an outage first. + +## Four-layer comparison + +Always compare these layers side by side: + +1. import batch result +2. provider snapshot or aggregate status +3. provider account inventory status +4. real data-plane evidence + +Do not jump directly from import noise to a user-facing conclusion. + +## Meaning of each layer + +Keep these separate: + +- `batch_status`: did every import-time check pass? +- `provider_status`: is the provider actually usable at provider level? +- `account_status`: what should operators believe about a specific account asset? +- `last_probe_status`: what happened in the last probe or normalized diagnostic view? + +These should not all collapse to the same string. + +## Preferred source of truth + +When real user traffic and probe-only signals disagree: + +- trust real data-plane success over probe-only failure +- trust host `usage_logs` over display counters when available +- trust access closure readiness over a single noisy account probe for provider-level availability + +## Normalization strategy + +Use a narrow rule rather than promoting everything. + +Good example: + +- batch is partial +- access closure is ready +- only one imported account resource exists +- smoke model is actually present +- raw account probe failed + +In this case: + +- provider-level state can be `active` +- account inventory may be normalized away from `broken` +- probe display can become `gateway_ready` or `warning` + +The point is to remove false negatives without hiding real breakage. + +## What must remain strict + +Do not normalize away these cases: + +- strict import failures +- rolled back batches +- broken access closure +- missing smoke model +- multi-account scenarios where one account may really be bad + +This skill is about reducing noise, not erasing legitimate failures. + +## Fix workflow + +### 1. Reproduce the disagreement + +Capture: + +- provider snapshot +- provider account inventory row +- real `/models` result +- real `/chat/completions` result +- usage log evidence if possible + +### 2. Identify the wrong abstraction boundary + +Typical causes: + +- provider status derived too directly from batch partiality +- account inventory mirrors raw probe status instead of normalized availability +- advisory or transient probe failure treated as definitive breakage + +### 3. Add tests first + +Write regression tests for: + +- provider-level promotion when access is truly ready +- account-level normalization only in the intended narrow scenario +- guardrails that keep real broken cases broken + +### 4. Change semantics minimally + +- keep raw batch detail truthful +- normalize higher-level status only where it improves operational meaning +- avoid changing unrelated enums or broad behavior + +### 5. Verify on a live sample + +Re-read the same provider and account on a real environment after deployment. + +You want to see: + +- raw batch still truthful +- aggregate provider state corrected +- account inventory corrected +- real request path still working diff --git a/.agent/skills/remote-truth-closure/SKILL.md b/.agent/skills/remote-truth-closure/SKILL.md new file mode 100644 index 00000000..6a3c9a9c --- /dev/null +++ b/.agent/skills/remote-truth-closure/SKILL.md @@ -0,0 +1,138 @@ +--- +name: remote-truth-closure +description: Drive long-running implementation tasks all the way through local gates, commit and push, remote deployment, real verification, and truthful status reporting. Use this whenever the user asks to continue a multi-step task, deploy to a real server, verify whether something is truly complete, push to multiple remotes, clean a noisy remote environment, or recover a task that may have drifted from reality. Also use it for Chinese requests such as “继续推进”, “部署到 remote43”, “是否真实验证”, “任务是否完成”, or “推到三个仓库”. +--- + +# Remote Truth Closure + +This skill exists to prevent false completion. + +Core idea: + +`local green != task complete` + +A task is only complete when the truth chain is closed: + +1. local code and tests +2. commit and push +3. real deployment target updated +4. real endpoint or real user path verified +5. status board updated truthfully + +## When to use + +Use this skill when any of the following are true: + +- The user asks to continue a long-running implementation +- The task includes deploy, push, restart, remote verification, or cleanup +- The user asks whether something is really done +- You need to compare local state vs remote truth +- You are updating `EXECUTION_BOARD.md` or an equivalent source of truth + +## Principles + +- Prefer evidence over assumptions +- Keep code changes and verification records separate +- Treat remote cleanup as part of verification quality +- Never let docs claim more than the evidence supports + +## Workflow + +### 1. Reconstruct current truth + +- Check tracked vs untracked changes +- Confirm repo, branch, target server, and active instance directory +- Read the latest execution board or runbook before changing it +- Identify whether previous progress was local-only or remotely verified + +### 2. Close the local gate + +- Add or update regression tests first +- Run the repo’s required quality gates +- Confirm the exact bug or missing behavior before implementing the fix + +### 3. Commit in two layers + +Prefer two commits when appropriate: + +- feature or fix commit +- verification or documentation commit + +This keeps behavior changes separate from proof recording. + +### 4. Deploy carefully + +When deploying to a fixed checkout: + +- update the checkout to the exact target commit +- fetch to a temporary ref if the checked-out branch refuses direct fetch +- replace the real app binary, not just a build artifact +- stop the active listener PID explicitly +- restart from the real app directory with the real env file loaded + +### 5. Verify remotely + +Always verify: + +- `/healthz` +- the actual API or page you changed +- the exact request path the user cares about + +If possible, verify the running commit or binary hash. + +## Verification hierarchy + +Trust signals in this order: + +1. real user data plane result +2. host `usage_logs` or equivalent request evidence +3. route decision or sticky or failover logs +4. provider or inventory projections +5. probe-only diagnostics + +If a lower-level status disagrees with a higher-level proof, treat it as a false-negative candidate. + +## Remote cleanup + +Use cleanup to reduce noise, not to hide evidence. + +- Keep one active app directory and one latest fixed checkout when possible +- Remove stale bundles, duplicate stacks, and dead helper processes only after confirming they are inactive +- Clean temporary test entities after verification if they would pollute future checks + +## Status board rules + +Only update the execution board after you have evidence. + +Each verification entry should capture: + +- exact commit IDs +- local gates run +- remote target updated +- concrete endpoint or request used +- key returned values +- what remains noisy or incomplete + +Never write “completed” if deployment or remote verification is still blocked. + +## Final answer checklist + +Before claiming completion, confirm all of these: + +- local tests passed +- intended files were committed +- push to all required remotes succeeded +- target remote is running the intended commit +- the changed endpoint or user path was verified on the real target +- docs were updated truthfully +- unrelated artifacts were not accidentally committed + +## Anti-patterns + +Avoid these: + +- claiming done after local tests only +- using doc text as proof of behavior +- restarting by process name when multiple binaries may exist +- trusting stale probe noise over real successful requests +- committing scratch artifacts or research notes by accident diff --git a/.agent/skills/routing-data-plane-e2e/SKILL.md b/.agent/skills/routing-data-plane-e2e/SKILL.md new file mode 100644 index 00000000..2e9e320e --- /dev/null +++ b/.agent/skills/routing-data-plane-e2e/SKILL.md @@ -0,0 +1,124 @@ +--- +name: routing-data-plane-e2e +description: Build or verify end-to-end control-plane plus data-plane flows for logical-group routing systems. Use this whenever the task involves logical groups, routes, shadow groups, shadow hosts, shadow models, managed keys, chat proxy bridges, sticky routing, failover, cooldowns, route health, or proving that a routed request really hit the intended upstream account. Also use it for Chinese requests such as “逻辑分组”, “智能路由”, “shadow_group”, “managed key”, “真实 chat/completions 验证”, or “从控制面到数据面闭环”. +--- + +# Routing Data Plane E2E + +This skill is for systems where configuration and request execution are split across layers. + +Typical shape: + +- control plane: logical groups, routes, public models, bindings +- runtime plane: sticky state, cooldowns, failover counters +- data plane: actual `/v1/chat/completions` or similar request path + +The job is not finished until all three layers line up. + +## Core model + +Use this mental model: + +- `logical_group` = product-facing grouping +- `route` = routing choice inside that grouping +- `shadow_group` = real host-side target group +- `shadow_model` = real host-side model name + +Keep aliases and public naming in the plugin layer whenever possible. + +## Canonical shadow rule + +Do not push public alias complexity down into the stock host unless you have verified the host supports it cleanly. + +Preferred approach: + +- plugin exposes public model naming +- shadow provider uses canonical upstream model naming +- host only sees the canonical shadow model + +Use this especially if alias-in-shadow previously caused pricing restriction or availability bugs. + +## Acceptance chain + +### 1. Control plane + +- create or load the `logical_group` +- create or load the `route` +- bind public model to route +- confirm shadow host, shadow group, and shadow model are present + +### 2. Runtime plane + +- verify sticky backend +- verify route health or route state +- if applicable, verify cooldown and failure counters + +### 3. Resolve step + +- call the resolver +- capture selected `route_id` +- capture selected `shadow_host_id` +- capture selected `shadow_group_id` +- capture selected `shadow_model` + +### 4. Data plane + +- send a real routed request through the formal routing endpoint +- verify upstream status code +- verify the selected route in decision logs + +### 5. Host-side proof + +- verify host-side usage evidence +- confirm the request hit the intended group, account, and model + +If step 5 is missing, you have a plausible path, not a proved path. + +## Preferred proof sources + +Use these together: + +- route decision logs +- sticky audit logs +- failover event logs +- host `usage_logs` +- host account or group metadata + +Do not rely only on: + +- front-end page state +- route list APIs +- `api_keys.usage_*` counters if host `usage_logs` are available + +## Sticky and failover expectations + +For stable verification: + +- first request should usually bind sticky +- second identical request should usually hit sticky +- cooldown or failure threshold should cause fallback +- fallback should produce a failover event +- repeated request after fallback should stabilize on the new route + +If these do not line up, the route layer is not yet production-ready. + +## Host limitations to watch + +- one host group may not support multiple channels cleanly +- alias-based model restriction can cause apparent availability with actual request failure +- route binding ambiguity can pollute provider account ownership + +If the host resists the desired abstraction, move complexity back into the plugin layer rather than forcing the host model. + +## Final E2E checklist + +Before calling a routing task complete, confirm: + +- logical group exists +- route exists +- public model binding exists +- provider account is imported and bound +- effective gateway key source is known +- routed request returns the expected upstream status +- host usage evidence matches the intended route, account, and model +- temporary test entities are cleaned if they would pollute later work