G-058: AI crawler & data-licensing posture — OLN Register

Operationalize the "citable, not trainable" posture (Entry 030): a per-crawler-class allow/block policy, WAF-level enforcement behind robots.txt, and licensing tiers that keep OLN freely citable while holding the structured graph in reserve as the asset a partnership buys.

Why this matters

Entry 030 decides the posture — allow citation crawlers, block training crawlers, license the structured graph not the CC-BY-SA prose. What remains open is the operationalization, and it is time-sensitive: a training crawl, once it happens, can't be undone. Open decisions:

Per-class crawler policy — the concrete robots.txt and the maintained user-agent lists behind it. Allow live citation/retrieval (OAI-SearchBot, PerplexityBot, AI-Overview fetch) and classic search (Googlebot); block training (GPTBot, CCBot, Google-Extended) by default. Who owns the list as new bots appear, and what the default is for an unknown AI user-agent.
Enforcement beyond the honor system — robots.txt and Google-Extended are honored by the major labs but ignored by many scrapers, so "block" is only real with WAF-level enforcement (e.g. Cloudflare AI-bot rules). Decide the enforcement layer and how aggressive it is, weighed against false-positives on legitimate citation crawlers we want.
Licensing tiers — define exactly what is free vs. reserved. Free: live, attributed citation. Reserved (the partnership product): bulk export, real-time/API access, the normalized structured graph, and explicit training rights. Pricing/terms can wait; the boundary cannot.
CC-BY-SA boundary (depends on G-044) — the imported prose is share-alike and cannot be fenced; the licensable asset is strictly the structure OLN creates (Facts, normalized entities, relationships, provenance, freshness). The policy must be written so it does not depend on the still-open ingest question (G-044) being resolved first.
Timing / sequencing — seed citability now to build authority; hold the premium tier in reserve until the data set has the density that makes a partnership worth negotiating. Define the trigger for opening the reserved tier.

Entry 030 — the deciding posture (citable, not trainable)
G-055 — discoverability moat: this policy is what makes "citable" real
G-026 — Fact-graph data quality: the structured asset being gated/licensed
G-029 — HellaThis intercompany licensing: precedent for licensing the graph
G-044 — Fandom CC-BY-SA derivative scope: bounds what is licensable at all
G-005 — AI policy: the contributor-attribution principle this posture extends

Why this matters

Related