Network/Journal/Type Trust Extraction Gate
All entries
Entry 038

The Way — a Fact's type must be trusted before it enters the graph

Date
2026-05-27
Status
Decided
Authority
Creator

Decision

A Fact is only as trustworthy as the entity it describes. So the pipeline gates extraction on type confidence: it will not auto-extract facts from an entity until it is confident what kind of thing that entity is. Confidence is graded by how many independent signals agree:

  • Corroborated — an authoritative type (Wikidata) and the source's own infobox agree. Highest trust; auto-extracts.
  • Single-signal — only one weak indicator, with nothing to corroborate it. Held for human validation.
  • Conflicting — the infobox disagrees with the chosen type. Held; a conflict is a red flag, not a coin toss.
  • No usable signal — held.

Corroborated infobox-agreement was promoted to "trusted" once we were satisfied that agreeing independent sources clear the bar; everything below it waits for a human. The default is fail-closed: when in doubt, don't publish.

Why

The whole value of the graph is that a fact attached to the wrong kind of thing is worse than no fact at all — it is confidently wrong, and it propagates. A character's "homeworld" makes no sense on a film; a film's "box office" makes no sense on a person. Getting the type right first is the cheapest place to stop an entire class of garbage, and corroboration across independent sources is a far better confidence signal than any single source's say-so. This is the same fail-closed instinct as source compliance (Entry 033) and citation tiers (Entry 020), applied one layer earlier — to the entity itself.

It also keeps humans where they add value: not rubber-stamping the obvious, but adjudicating the genuinely ambiguous middle band.

Open threads

  • The single schema source: the per-type vocabulary that display uses and the vocabulary the extractor asks for should be one curated source, not two that can drift. Approach and timing are still to settle.
  • Spot-validating samples of the ambiguous band to turn the confidence thresholds from a careful guess into measured accuracy.

Related

  • Entry 020 — citation strength tiers
  • Entry 022 — the AI / ML role split (propose, dispose, enforce)
  • Entry 033 — fail-closed source compliance
  • Entry 035 — entity facts as registry data (the display side of the same schema)
  • G-026 — Fact graph data quality
  • G-060 — the review queue where held facts are adjudicated