# bid-poc — Baseline Pillar Proof-of-Concept
This is a runnable proof-of-concept for the **Baseline pillar** of the
BID (Baseline / Intelligence / Decision) agent framework. It demonstrates
the framework's architecture end-to-end with three live Baseline agents,
and is structured so that the Intelligence and Decision pillars can be
added later **without touching the orchestrator** — the pipeline is
data, not code.
## What this POC demonstrates
- The **12 universal operational standards** that every BID agent must
satisfy (objectives, inputs, decision logic, rules, methods/tools,
processing runbook, validation/confidence, conditional triggers,
HITL escalation, repository write-back, handoff envelope, failure
handling) — encoded as typed contracts in `src/standards.ts` and
enforced at runtime.
- The three **Baseline agents** running end-to-end against
**unstructured** raw source documents (HTML / fixed-width text /
narrative prose), with full lineage propagation, validation/confidence
tiers, and structured failure handling.
- A pluggable, capability-typed **tool layer** under `src/tools/` —
retrieval, parser, taxonomy, AI-with-citation (via `@anthropic-ai/sdk`,
with a deterministic fallback when no API key is set).
- A **pipeline-as-data** model (`src/pipeline.ts`) so adding
Intelligence agents or HITL checkpoints is append-only.
- An **audit trail** written to `output/run-<timestamp>.json` after each
run, capturing every handoff, every escalation, every learned rule,
every exception, and every trace entry.
## The 12 standards (one-liner each)
1. **Objective** — single clear responsibility, explicit boundaries, downstream purpose.
2. **Inputs** — structured, machine-readable; lineage + confidence persist.
3. **Decision logic** — explicit, deterministic where possible, every decision recorded.
4. **Rules & constraints** — preserve raw / lineage / audit; no fabrication; approved tools only. *Baseline-specific:* no strategic insight, no benchmarking, no maturity scoring, no recommendations.
5. **Methods & tools** — capability-based, approved + connected, lineage-preserving.
6. **Processing** — modular, repeatable, replayable; numbered runbook.
7. **Validation & confidence** — every output carries validationStatus + confidence (0–1 + tier).
8. **Conditional triggers** — explicit, named exception categories; traceable + context-preserving.
9. **HITL escalation** — defined thresholds; escalations carry full context + recommended reviewer role.
10. **Repository write-back** — agents *declare*; orchestrator *persists*.
11. **Handoff** — standardized envelope; downstream never reconstructs upstream context.
12. **Failure handling** — fail safely; `MAX_RETRIES=3`, `MAX_RECURSION_DEPTH=5`; structured `FailureObject`.
See `src/standards.ts` for the typed contracts and `STANDARDS_SUMMARY`
(also embedded in every run's audit JSON).
## The three Baseline agents
| Agent | Objective | Output |
|---|---|---|
| `baseline.source-extraction` | Acquire and structure source data from approved sources. Parses **unstructured** raw payloads at runtime. | `(label, value, unit)` triples with provenance + comparability notes. |
| `baseline.normalization` | Standardize extracted records into a canonical taxonomy, unit, and entity space. Raw values preserved alongside. | Records keyed by `(canonicalEntity, canonicalMetric, period)` with values in `USD_MM`. |
| `baseline.resolution` | Resolve exceptions, ambiguity, conflicts, and low-confidence outputs. Escalates residuals to HITL. | Resolved dataset + escalation package. |
Each agent has the same five-file layout under
`src/agents/baseline/<agent-name>/`:
`schema.ts`, `runbook.ts`, `tools.ts`, `index.ts`, `README.md`.
## How to run
```bash
cd bid-poc
npm install
# Canned demo (fixed JobRequest, 3 entities, revenue, FY-2024):
npm run demo
# Free-form prompt — interpreted into a JobRequest, then run end-to-end:
npm run ask -- "How much revenue did ACME and Globex report in 2024?"
# Re-render the latest JSON into HTML:
npm run report
```
`npm run ask` accepts any natural-language prompt. It extracts the
entities the mock retrieval layer knows about (`ACME`, `GLOBEX`,
`INITECH`), a year from the prompt (falls back to `FY-2024`), and a
metric keyword (`revenue` / `sales` / etc.), builds a `JobRequest`, and
runs the full Baseline pipeline. Each run writes both
`output/ask-<timestamp>.json` and `output/ask-<timestamp>.html`.
The script `scripts/run-demo.ts` builds a sample `JobRequest`:
- 3 entities (`ACME`, `GLOBEX`, `INITECH`)
- 1 metric (`revenue`)
- 1 source (`sec-edgar`)
- Period `FY-2024`
- 2 seed mappings (`Net revenues → revenue`, `Total revenue → revenue`)
Console output prints every agent's step with the standard it enforces,
e.g.:
```
=== Agent: baseline.source-extraction v0.1.0 ===
Objective: Acquire and structure source data ...
[baseline.source-extraction][Std 5] fetch-outside-source — fetching ACME from sec-edgar
[baseline.source-extraction][Std 7] extract-required-elements — extracted 3 candidate(s) for ACME
...
```
If `ANTHROPIC_API_KEY` is set, the AI-with-citation tool will hit the
Claude API for novel label mappings. Otherwise it logs a notice and
falls back to a deterministic Jaccard-similarity matcher — the demo
runs offline either way.
## How to read the audit trail
After each run, `output/run-<timestamp>.json` contains:
- `analysisId`, `ok`, `failure` — top-level outcome.
- `finalHandoff` — the final Baseline-pillar output (analytics-ready
dataset for the Intelligence pillar).
- `escalations[]` — HITL escalation packages with full context.
- `repositorySnapshot` — what got written back per Std 10:
- `records[]` — persisted handoff records (one per agent).
- `exceptions[]` — exception log (Std 8).
- `learnedRules[]` — rules learned during this run (Std 10).
- `overrides[]` — human overrides (Std 10; empty in the POC).
- `failures[]` — structured `FailureObject`s.
- `trace[]` — per-step trace with the standard each step enforces.
- `standards[]` — the 12 universal standards summarized for self-describing audits.
- `pipeline[]` — the exact pipeline that was executed.
## Web console
A small browsable console under `bid-poc/web/` renders past audit
runs, the cited insights, the pipeline trace, the token-usage table,
and a syntax-highlighted code browser over the framework source. Two
flavours:
### Local development (live runner)
```bash
cd bid-poc
ANTHROPIC_API_KEY=sk-ant-... npm run web
# → open http://localhost:4178
```
The local server adds a question-box on the homepage that POSTs to
`/api/run`, spawns `npm run demo` as a child process, and streams the
pipeline's stdout to the browser over Server-Sent Events. When the
demo finishes, the page auto-navigates to the new run's detail view.
Requires `ANTHROPIC_API_KEY` in the server's env.
### Static build (Cloudflare Pages)
```bash
cd bid-poc
npm install
npm run build:static
# → outputs bid-poc/dist/
```
`scripts/build-static.ts` reuses the same view functions as the live
server (mode `"static"`) and emits a self-contained tree:
```
dist/
index.html # landing + run list
run/<run-id>/index.html # one page per past run
code/index.html # curated entry points + tree
code/<file-path>/index.html # one page per browsable file
static/ # CSS + client JS (form bails)
_redirects + 404.html # friendly fallback
build-meta.json
```
The static build drops the live-run form and replaces it with a
notice pointing at the local development version. Everything else
(run details, cited insights, code browser) works identically.
**Cloudflare Pages settings**
| Setting | Value |
|------------------------------|----------------------------------------------------|
| Build command | `cd bid-poc && npm install && npm run build:static`|
| Build output directory | `bid-poc/dist` |
| Environment variables | _none required_ — build is deterministic from `output/run-*.json` checked into the repo |
| Root directory (optional) | leave at repo root |
Each new commit triggers a rebuild; new runs in `bid-poc/output/`
get a page automatically.
## How to extend
The framework is built so that each kind of extension touches a single
seam.
### Add an Intelligence agent
1. Create `src/agents/intelligence/<agent-name>/` with the five-file
structure (`schema.ts`, `runbook.ts`, `tools.ts`, `index.ts`, `README.md`).
2. Implement the 12 universals (declare an `AgentStandardsContract`,
return `AgentResult<T>`), and layer the 5 Intelligence-specific
standards on top.
3. Add one switch arm in `src/orchestrator.ts` mapping the agent's FQN
to its executor, and register its contract.
4. Append to `src/pipeline.ts`:
```ts
{ kind: 'agent', pillar: 'intelligence', agent: '<agent-name>' }
```
The Baseline pillar and the orchestrator's outer loop don't change.
### Add a HITL checkpoint
Append a step to `src/pipeline.ts`:
```ts
{ kind: 'checkpoint', name: 'records-review' }
```
Place it between Baseline and Intelligence to require analyst review of
resolved records before insight generation.
### Add a new tool
Implement the tool's interface in `src/tools/<tool>.ts`, exporting:
- a `ToolDescriptor` (name, capability, `approved: true`) so the agent
can declare it in its `toolset` (Std 5),
- one or more pure functions / classes that return structured results
(Std 12 — never throw across the boundary).
Re-export from `src/tools/index.ts`.
### Swap the mock retrieval for a real fetcher
Change only `src/tools/retrieval.ts`. The `fetchRaw(req)` signature
returns `RetrievalResult` with a raw `body` string and a `shape` hint —
hook this up to a real HTTP client, SEC EDGAR API, or vendor SDK and
nothing in `src/agents/**` has to change.
## Project layout
```
bid-poc/
├── package.json
├── tsconfig.json
├── README.md ← this file
├── scripts/
│ └── run-demo.ts
├── output/ ← run-*.json audit trails
└── src/
├── standards.ts ← 12 universal standards (typed contracts)
├── types.ts ← JobRequest, Handoff, Lineage, FailureObject
├── pipeline.ts ← pipeline-as-data
├── orchestrator.ts ← walks pipeline, persists, escalates
├── repository.ts ← in-memory write-back
├── tools/
│ ├── retrieval.ts ← mock (swap for real fetcher here)
│ ├── parser.ts ← multi-shape parser
│ ├── taxonomy.ts ← taxonomy / ontology / entity resolver
│ ├── ai.ts ← @anthropic-ai/sdk + deterministic fallback
│ └── index.ts
└── agents/
├── baseline/
│ ├── source-extraction/ ← {schema,runbook,tools,index}.ts + README.md
│ ├── normalization/
│ └── resolution/
├── intelligence/ ← README placeholder (how to add)
└── decision/ ← README placeholder
```