Code · src/agents/baseline/normalization/README.md

src/agents/baseline/normalization/README.md 1,786 bytes · markdown
# Normalization Agent

Self-contained 12-standards reasoner. Maps raw labels and entities to a
canonical form using learned mappings + LLM judgment for novel cases.

## Files

| File | Contents |
|---|---|
| `matrix.ts` | 12 per-agent matrix rows + the derived `AgentStandardsContract` |
| `schema.ts` | Zod input (Source/Extraction Structured Payload) + output (NormalizationOutput) schemas |
| `runbook.ts` | Re-export of the matrix's runbook |
| `prompt.ts` | Builds the agent's system prompt from canonical standards + matrix row |
| `llm.ts` | Self-contained Anthropic SDK wrapper + judgment functions (`mapTerminology`, `convertUnit`) |
| `storage.ts` | Direct file I/O for learned mappings under `data/baseline.normalization/` |
| `index.ts` | `runNormalization(input, side, ctx)` |

## Runbook

1. **validate-input** (Std 2)
2. **normalize-entities** (Std 5) — learned-mapping lookup, deterministic.
3. **normalize-terminology** (Std 3 + Std 5) — learned-mapping lookup for known labels; LLM judgment with citation for novel labels; new rule appended to storage.
4. **normalize-units-and-periods** (Std 5) — identity-pass when raw unit equals target; LLM judgment with rationale when it differs. No hardcoded FX in the framework.
5. **resolve-duplicates** (Std 4 + Std 8) — detect contradictions on `(canonicalEntity, canonicalMetric, period)`.
6. **validate-output** (Std 7) — confidence + validation status.
7. **handoff** (Std 11) — analytics-ready dataset for the Resolution agent.

## Mixed intelligence

Steps 1, 2 (lookup branch), 4 (identity branch), 5, 6, 7 are deterministic.
Steps 3 (novel-label branch) and 4 (unit-conversion branch) call the LLM.
Without `ANTHROPIC_API_KEY`, novel labels surface as `unmapped-term`
unresolved issues + HITL escalation.