Code · src/agents/baseline/source-extraction/README.md

src/agents/baseline/source-extraction/README.md 1,819 bytes · markdown
# Source/Extraction Agent

Self-contained 12-standards reasoner. Combines `src/standards.ts`
(canonical) with this folder's `matrix.ts` (the per-agent fill-ins) into
a single system prompt that bounds every LLM judgment call.

## Files

| File | What it holds |
|---|---|
| `matrix.ts` | The 12 per-agent matrix rows + the typed `AgentStandardsContract` derived from them |
| `schema.ts` | Zod input (JobRequest) + output (Structured Payload) schemas |
| `runbook.ts` | Re-export of the matrix's `runbook` array |
| `prompt.ts` | Assembles canonical 12 standards + matrix row into the system prompt |
| `llm.ts` | Self-contained Anthropic SDK wrapper + LLM judgment functions |
| `index.ts` | `runSourceExtraction(input, ctx)` — the executor that walks the runbook |

## Runbook

1. **validate-input** (Std 2) — deterministic schema validation of the JobRequest.
2. **retrieve** (Std 4, Std 5) — for each `(entity, source)` pair, call the connector dispatcher (`src/tools/retrieval/dispatcher.ts`).
3. **parse-and-extract** (Std 3, Std 5) — LLM judgment: locate target metric values inside the raw payload, cite the snippet, score confidence.
4. **structure** (Std 4) — deterministic: stamp provenance (`sourceUrl`, `capturedAt`, `sourceConnector`) on every value.
5. **validate-output** (Std 7) — deterministic: coverage, lineage, confidence, validation status; detect duplicates + mixed units → unresolved issues + comparability notes.
6. **handoff** (Std 11) — package the Structured Payload for the Normalization agent.

## Mixed intelligence

Deterministic skeleton, LLM only where judgment is required (step 3). If
`ANTHROPIC_API_KEY` is not configured, step 3 returns a structured
`needs-api-key` failure; steps 1, 2, 4, 5, 6 still execute and the
agent escalates the missing-LLM issue for HITL.