.agent/skills/false-negative-status-triage/SKILL.md

---
name: false-negative-status-triage
description: Diagnose and fix false-negative status signals when control-plane status says something is degraded or broken but real user traffic works. Use this whenever provider status, account status, probe status, route health, or inventory state disagrees with real `/models`, `/chat/completions`, usage logs, or verified user flows. Also use it for Chinese requests such as “误报”, “false-negative”, “状态语义不一致”, “provider_status 不准”, “last_probe_status 错误”, or “真实数据面可用但后台还显示失败”.
---

# False-Negative Status Triage

This skill is for signal reconciliation.

Problem pattern:

- real request path works
- status projection still says degraded, broken, or failed

Treat this as a modeling problem first, not an outage first.

## Four-layer comparison

Always compare these layers side by side:

1. import batch result
2. provider snapshot or aggregate status
3. provider account inventory status
4. real data-plane evidence

Do not jump directly from import noise to a user-facing conclusion.

## Meaning of each layer

Keep these separate:

- `batch_status`: did every import-time check pass?
- `provider_status`: is the provider actually usable at provider level?
- `account_status`: what should operators believe about a specific account asset?
- `last_probe_status`: what happened in the last probe or normalized diagnostic view?

These should not all collapse to the same string.

## Preferred source of truth

When real user traffic and probe-only signals disagree:

- trust real data-plane success over probe-only failure
- trust host `usage_logs` over display counters when available
- trust access closure readiness over a single noisy account probe for provider-level availability

## Normalization strategy

Use a narrow rule rather than promoting everything.

Good example:

- batch is partial
- access closure is ready
- only one imported account resource exists
- smoke model is actually present
- raw account probe failed

In this case:

- provider-level state can be `active`
- account inventory may be normalized away from `broken`
- probe display can become `gateway_ready` or `warning`

The point is to remove false negatives without hiding real breakage.

## What must remain strict

Do not normalize away these cases:

- strict import failures
- rolled back batches
- broken access closure
- missing smoke model
- multi-account scenarios where one account may really be bad

This skill is about reducing noise, not erasing legitimate failures.

## Fix workflow

### 1. Reproduce the disagreement

Capture:

- provider snapshot
- provider account inventory row
- real `/models` result
- real `/chat/completions` result
- usage log evidence if possible

### 2. Identify the wrong abstraction boundary

Typical causes:

- provider status derived too directly from batch partiality
- account inventory mirrors raw probe status instead of normalized availability
- advisory or transient probe failure treated as definitive breakage

### 3. Add tests first

Write regression tests for:

- provider-level promotion when access is truly ready
- account-level normalization only in the intended narrow scenario
- guardrails that keep real broken cases broken

### 4. Change semantics minimally

- keep raw batch detail truthful
- normalize higher-level status only where it improves operational meaning
- avoid changing unrelated enums or broad behavior

### 5. Verify on a live sample

Re-read the same provider and account on a real environment after deployment.

You want to see:

- raw batch still truthful
- aggregate provider state corrected
- account inventory corrected
- real request path still working
chore(skills): add project workflow skills 2026-05-30 14:55:16 +08:00			`---`
			`name: false-negative-status-triage`
			description: Diagnose and fix false-negative status signals when control-plane status says something is degraded or broken but real user traffic works. Use this whenever provider status, account status, probe status, route health, or inventory state disagrees with real `/models`, `/chat/completions`, usage logs, or verified user flows. Also use it for Chinese requests such as “误报”, “false-negative”, “状态语义不一致”, “provider_status 不准”, “last_probe_status 错误”, or “真实数据面可用但后台还显示失败”.
			`---`

			`# False-Negative Status Triage`

			`This skill is for signal reconciliation.`

			`Problem pattern:`

			`- real request path works`
			`- status projection still says degraded, broken, or failed`

			`Treat this as a modeling problem first, not an outage first.`

			`## Four-layer comparison`

			`Always compare these layers side by side:`

			`1. import batch result`
			`2. provider snapshot or aggregate status`
			`3. provider account inventory status`
			`4. real data-plane evidence`

			`Do not jump directly from import noise to a user-facing conclusion.`

			`## Meaning of each layer`

			`Keep these separate:`

			- `batch_status`: did every import-time check pass?
			- `provider_status`: is the provider actually usable at provider level?
			- `account_status`: what should operators believe about a specific account asset?
			- `last_probe_status`: what happened in the last probe or normalized diagnostic view?

			`These should not all collapse to the same string.`

			`## Preferred source of truth`

			`When real user traffic and probe-only signals disagree:`

			`- trust real data-plane success over probe-only failure`
			- trust host `usage_logs` over display counters when available
			`- trust access closure readiness over a single noisy account probe for provider-level availability`

			`## Normalization strategy`

			`Use a narrow rule rather than promoting everything.`

			`Good example:`

			`- batch is partial`
			`- access closure is ready`
			`- only one imported account resource exists`
			`- smoke model is actually present`
			`- raw account probe failed`

			`In this case:`

			- provider-level state can be `active`
			- account inventory may be normalized away from `broken`
			- probe display can become `gateway_ready` or `warning`

			`The point is to remove false negatives without hiding real breakage.`

			`## What must remain strict`

			`Do not normalize away these cases:`

			`- strict import failures`
			`- rolled back batches`
			`- broken access closure`
			`- missing smoke model`
			`- multi-account scenarios where one account may really be bad`

			`This skill is about reducing noise, not erasing legitimate failures.`

			`## Fix workflow`

			`### 1. Reproduce the disagreement`

			`Capture:`

			`- provider snapshot`
			`- provider account inventory row`
			- real `/models` result
			- real `/chat/completions` result
			`- usage log evidence if possible`

			`### 2. Identify the wrong abstraction boundary`

			`Typical causes:`

			`- provider status derived too directly from batch partiality`
			`- account inventory mirrors raw probe status instead of normalized availability`
			`- advisory or transient probe failure treated as definitive breakage`

			`### 3. Add tests first`

			`Write regression tests for:`

			`- provider-level promotion when access is truly ready`
			`- account-level normalization only in the intended narrow scenario`
			`- guardrails that keep real broken cases broken`

			`### 4. Change semantics minimally`

			`- keep raw batch detail truthful`
			`- normalize higher-level status only where it improves operational meaning`
			`- avoid changing unrelated enums or broad behavior`

			`### 5. Verify on a live sample`

			`Re-read the same provider and account on a real environment after deployment.`

			`You want to see:`

			`- raw batch still truthful`
			`- aggregate provider state corrected`
			`- account inventory corrected`
			`- real request path still working`