Code · README.md

README.md 11,494 bytes · markdown
# bid-poc — Baseline Pillar Proof-of-Concept

This is a runnable proof-of-concept for the **Baseline pillar** of the
BID (Baseline / Intelligence / Decision) agent framework. It demonstrates
the framework's architecture end-to-end with three live Baseline agents,
and is structured so that the Intelligence and Decision pillars can be
added later **without touching the orchestrator** — the pipeline is
data, not code.

## What this POC demonstrates

- The **12 universal operational standards** that every BID agent must
  satisfy (objectives, inputs, decision logic, rules, methods/tools,
  processing runbook, validation/confidence, conditional triggers,
  HITL escalation, repository write-back, handoff envelope, failure
  handling) — encoded as typed contracts in `src/standards.ts` and
  enforced at runtime.
- The three **Baseline agents** running end-to-end against
  **unstructured** raw source documents (HTML / fixed-width text /
  narrative prose), with full lineage propagation, validation/confidence
  tiers, and structured failure handling.
- A pluggable, capability-typed **tool layer** under `src/tools/` —
  retrieval, parser, taxonomy, AI-with-citation (via `@anthropic-ai/sdk`,
  with a deterministic fallback when no API key is set).
- A **pipeline-as-data** model (`src/pipeline.ts`) so adding
  Intelligence agents or HITL checkpoints is append-only.
- An **audit trail** written to `output/run-<timestamp>.json` after each
  run, capturing every handoff, every escalation, every learned rule,
  every exception, and every trace entry.

## The 12 standards (one-liner each)

1. **Objective** — single clear responsibility, explicit boundaries, downstream purpose.
2. **Inputs** — structured, machine-readable; lineage + confidence persist.
3. **Decision logic** — explicit, deterministic where possible, every decision recorded.
4. **Rules & constraints** — preserve raw / lineage / audit; no fabrication; approved tools only. *Baseline-specific:* no strategic insight, no benchmarking, no maturity scoring, no recommendations.
5. **Methods & tools** — capability-based, approved + connected, lineage-preserving.
6. **Processing** — modular, repeatable, replayable; numbered runbook.
7. **Validation & confidence** — every output carries validationStatus + confidence (0–1 + tier).
8. **Conditional triggers** — explicit, named exception categories; traceable + context-preserving.
9. **HITL escalation** — defined thresholds; escalations carry full context + recommended reviewer role.
10. **Repository write-back** — agents *declare*; orchestrator *persists*.
11. **Handoff** — standardized envelope; downstream never reconstructs upstream context.
12. **Failure handling** — fail safely; `MAX_RETRIES=3`, `MAX_RECURSION_DEPTH=5`; structured `FailureObject`.

See `src/standards.ts` for the typed contracts and `STANDARDS_SUMMARY`
(also embedded in every run's audit JSON).

## The three Baseline agents

| Agent | Objective | Output |
|---|---|---|
| `baseline.source-extraction` | Acquire and structure source data from approved sources. Parses **unstructured** raw payloads at runtime. | `(label, value, unit)` triples with provenance + comparability notes. |
| `baseline.normalization` | Standardize extracted records into a canonical taxonomy, unit, and entity space. Raw values preserved alongside. | Records keyed by `(canonicalEntity, canonicalMetric, period)` with values in `USD_MM`. |
| `baseline.resolution` | Resolve exceptions, ambiguity, conflicts, and low-confidence outputs. Escalates residuals to HITL. | Resolved dataset + escalation package. |

Each agent has the same five-file layout under
`src/agents/baseline/<agent-name>/`:
`schema.ts`, `runbook.ts`, `tools.ts`, `index.ts`, `README.md`.

## How to run

```bash
cd bid-poc
npm install

# Canned demo (fixed JobRequest, 3 entities, revenue, FY-2024):
npm run demo

# Free-form prompt — interpreted into a JobRequest, then run end-to-end:
npm run ask -- "How much revenue did ACME and Globex report in 2024?"

# Re-render the latest JSON into HTML:
npm run report
```

`npm run ask` accepts any natural-language prompt. It extracts the
entities the mock retrieval layer knows about (`ACME`, `GLOBEX`,
`INITECH`), a year from the prompt (falls back to `FY-2024`), and a
metric keyword (`revenue` / `sales` / etc.), builds a `JobRequest`, and
runs the full Baseline pipeline. Each run writes both
`output/ask-<timestamp>.json` and `output/ask-<timestamp>.html`.

The script `scripts/run-demo.ts` builds a sample `JobRequest`:
- 3 entities (`ACME`, `GLOBEX`, `INITECH`)
- 1 metric (`revenue`)
- 1 source (`sec-edgar`)
- Period `FY-2024`
- 2 seed mappings (`Net revenues → revenue`, `Total revenue → revenue`)

Console output prints every agent's step with the standard it enforces,
e.g.:
```
=== Agent: baseline.source-extraction v0.1.0 ===
  Objective: Acquire and structure source data ...
  [baseline.source-extraction][Std 5] fetch-outside-source — fetching ACME from sec-edgar
  [baseline.source-extraction][Std 7] extract-required-elements — extracted 3 candidate(s) for ACME
  ...
```

If `ANTHROPIC_API_KEY` is set, the AI-with-citation tool will hit the
Claude API for novel label mappings. Otherwise it logs a notice and
falls back to a deterministic Jaccard-similarity matcher — the demo
runs offline either way.

## How to read the audit trail

After each run, `output/run-<timestamp>.json` contains:

- `analysisId`, `ok`, `failure` — top-level outcome.
- `finalHandoff` — the final Baseline-pillar output (analytics-ready
  dataset for the Intelligence pillar).
- `escalations[]` — HITL escalation packages with full context.
- `repositorySnapshot` — what got written back per Std 10:
  - `records[]` — persisted handoff records (one per agent).
  - `exceptions[]` — exception log (Std 8).
  - `learnedRules[]` — rules learned during this run (Std 10).
  - `overrides[]` — human overrides (Std 10; empty in the POC).
  - `failures[]` — structured `FailureObject`s.
- `trace[]` — per-step trace with the standard each step enforces.
- `standards[]` — the 12 universal standards summarized for self-describing audits.
- `pipeline[]` — the exact pipeline that was executed.

## Web console

A small browsable console under `bid-poc/web/` renders past audit
runs, the cited insights, the pipeline trace, the token-usage table,
and a syntax-highlighted code browser over the framework source. Two
flavours:

### Local development (live runner)

```bash
cd bid-poc
ANTHROPIC_API_KEY=sk-ant-... npm run web
# → open http://localhost:4178
```

The local server adds a question-box on the homepage that POSTs to
`/api/run`, spawns `npm run demo` as a child process, and streams the
pipeline's stdout to the browser over Server-Sent Events. When the
demo finishes, the page auto-navigates to the new run's detail view.
Requires `ANTHROPIC_API_KEY` in the server's env.

### Static build (Cloudflare Pages)

```bash
cd bid-poc
npm install
npm run build:static
# → outputs bid-poc/dist/
```

`scripts/build-static.ts` reuses the same view functions as the live
server (mode `"static"`) and emits a self-contained tree:

```
dist/
  index.html                       # landing + run list
  run/<run-id>/index.html          # one page per past run
  code/index.html                  # curated entry points + tree
  code/<file-path>/index.html      # one page per browsable file
  static/                          # CSS + client JS (form bails)
  _redirects + 404.html            # friendly fallback
  build-meta.json
```

The static build drops the live-run form and replaces it with a
notice pointing at the local development version. Everything else
(run details, cited insights, code browser) works identically.

**Cloudflare Pages settings**

| Setting                      | Value                                              |
|------------------------------|----------------------------------------------------|
| Build command                | `cd bid-poc && npm install && npm run build:static`|
| Build output directory       | `bid-poc/dist`                                     |
| Environment variables        | _none required_ — build is deterministic from `output/run-*.json` checked into the repo |
| Root directory (optional)    | leave at repo root                                 |

Each new commit triggers a rebuild; new runs in `bid-poc/output/`
get a page automatically.

## How to extend

The framework is built so that each kind of extension touches a single
seam.

### Add an Intelligence agent

1. Create `src/agents/intelligence/<agent-name>/` with the five-file
   structure (`schema.ts`, `runbook.ts`, `tools.ts`, `index.ts`, `README.md`).
2. Implement the 12 universals (declare an `AgentStandardsContract`,
   return `AgentResult<T>`), and layer the 5 Intelligence-specific
   standards on top.
3. Add one switch arm in `src/orchestrator.ts` mapping the agent's FQN
   to its executor, and register its contract.
4. Append to `src/pipeline.ts`:
   ```ts
   { kind: 'agent', pillar: 'intelligence', agent: '<agent-name>' }
   ```

The Baseline pillar and the orchestrator's outer loop don't change.

### Add a HITL checkpoint

Append a step to `src/pipeline.ts`:
```ts
{ kind: 'checkpoint', name: 'records-review' }
```
Place it between Baseline and Intelligence to require analyst review of
resolved records before insight generation.

### Add a new tool

Implement the tool's interface in `src/tools/<tool>.ts`, exporting:
- a `ToolDescriptor` (name, capability, `approved: true`) so the agent
  can declare it in its `toolset` (Std 5),
- one or more pure functions / classes that return structured results
  (Std 12 — never throw across the boundary).
Re-export from `src/tools/index.ts`.

### Swap the mock retrieval for a real fetcher

Change only `src/tools/retrieval.ts`. The `fetchRaw(req)` signature
returns `RetrievalResult` with a raw `body` string and a `shape` hint —
hook this up to a real HTTP client, SEC EDGAR API, or vendor SDK and
nothing in `src/agents/**` has to change.

## Project layout

```
bid-poc/
├── package.json
├── tsconfig.json
├── README.md                       ← this file
├── scripts/
│   └── run-demo.ts
├── output/                         ← run-*.json audit trails
└── src/
    ├── standards.ts                ← 12 universal standards (typed contracts)
    ├── types.ts                    ← JobRequest, Handoff, Lineage, FailureObject
    ├── pipeline.ts                 ← pipeline-as-data
    ├── orchestrator.ts             ← walks pipeline, persists, escalates
    ├── repository.ts               ← in-memory write-back
    ├── tools/
    │   ├── retrieval.ts            ← mock (swap for real fetcher here)
    │   ├── parser.ts               ← multi-shape parser
    │   ├── taxonomy.ts             ← taxonomy / ontology / entity resolver
    │   ├── ai.ts                   ← @anthropic-ai/sdk + deterministic fallback
    │   └── index.ts
    └── agents/
        ├── baseline/
        │   ├── source-extraction/  ← {schema,runbook,tools,index}.ts + README.md
        │   ├── normalization/
        │   └── resolution/
        ├── intelligence/           ← README placeholder (how to add)
        └── decision/               ← README placeholder
```