Operationalize the "citable, not trainable" posture (Entry 030): a per-crawler-class allow/block policy, WAF-level enforcement behind robots.txt, and licensing tiers that keep OLN freely citable while holding the structured graph in reserve as the asset a partnership buys.
Why this matters
Entry 030 decides the posture — allow citation crawlers, block training crawlers, license the structured graph not the CC-BY-SA prose. What remains open is the operationalization, and it is time-sensitive: a training crawl, once it happens, can't be undone. Open decisions:
- Per-class crawler policy — the concrete
robots.txtand the maintained user-agent lists behind it. Allow live citation/retrieval (OAI-SearchBot, PerplexityBot, AI-Overview fetch) and classic search (Googlebot); block training (GPTBot, CCBot, Google-Extended) by default. Who owns the list as new bots appear, and what the default is for an unknown AI user-agent. - Enforcement beyond the honor system —
robots.txtandGoogle-Extendedare honored by the major labs but ignored by many scrapers, so "block" is only real with WAF-level enforcement (e.g. Cloudflare AI-bot rules). Decide the enforcement layer and how aggressive it is, weighed against false-positives on legitimate citation crawlers we want. - Licensing tiers — define exactly what is free vs. reserved. Free: live, attributed citation. Reserved (the partnership product): bulk export, real-time/API access, the normalized structured graph, and explicit training rights. Pricing/terms can wait; the boundary cannot.
- CC-BY-SA boundary (depends on G-044) — the imported prose is share-alike and cannot be fenced; the licensable asset is strictly the structure OLN creates (Facts, normalized entities, relationships, provenance, freshness). The policy must be written so it does not depend on the still-open ingest question (G-044) being resolved first.
- Timing / sequencing — seed citability now to build authority; hold the premium tier in reserve until the data set has the density that makes a partnership worth negotiating. Define the trigger for opening the reserved tier.
Related
- Entry 030 — the deciding posture (citable, not trainable)
- G-055 — discoverability moat: this policy is what makes "citable" real
- G-026 — Fact-graph data quality: the structured asset being gated/licensed
- G-029 — HellaThis intercompany licensing: precedent for licensing the graph
- G-044 — Fandom CC-BY-SA derivative scope: bounds what is licensable at all
- G-005 — AI policy: the contributor-attribution principle this posture extends