Decision
The Ingest Studio is a first-class platform surface, not admin tooling. Components:
- Provider Registry — every configured data source with auth, polling, rate limits, license metadata, schema fingerprint
- Sample Explorer — fetch and inspect raw + AI-extracted + mapped + downstream-triple views at any layer
- Mapping Workshop — versioned per-(provider × OLN entity-type) mapping documents covering field paths, transforms, confidence floors, validation rules, and approval status
- Re-run Console — re-run controls at every dimension (triple, entity, Franchise, entity-type, provider, time window, mapping version, lifecycle state), combinable, with mandatory dry-run preview before commit
- Drift Detector — scheduled re-fetch and schema diff per provider, with AI-proposed mapping updates routed to the Creator
- Mapping Lineage Graph — versioned mappings make "what changes when I update this transform?" a queryable question
Operating principle: AI proposes, HI disposes, code enforces. No auto-approval threshold. Every mapping change requires explicit human action before it touches production triples.
Reasoning
Three forcing functions:
- Constitutional enforcement at the data layer. Provider attribution, lifecycle state, and citation tier are all mapping-layer concerns — the Studio is where governance meets ingestion. Decisions made here have Constitutional weight.
- Schema discipline prevents platform rot. Every reference platform that scaled without this layer eventually drowned in inconsistent data. Adding it later is brutal; building it first is cheap.
- Compounding moat. Curated mappings with provenance, versioning, and re-run capability are an asset. Future contributor onboarding inherits the framework.
The "no auto-approval threshold" rule is non-negotiable. The minute that gate weakens, the schema starts drifting in subtle ways nobody catches until a Franchise Team revolts.