Entry 031: Gamifying the graph — fun as a quality flywheel, tuned by A/B

Decision

Turning prose-into-Facts into something members want to do is the highest- leverage move on the authoring friction that G-057 names as make-or-break. Four principles govern how OLN gamifies it.

Gamify quality, never volume. The graph's entire value is trust (the AEO citability bet, Entry 030; data quality, G-026). Rewarding how many Facts someone adds is Goodhart's law pointed at the moat — it floods the graph with unsourced or invented Facts. So rewards attach to accuracy, resolution of contested/low-confidence Facts, and graph completeness — never raw output. Done this way the fun mechanic is the quality flywheel: every unit of fun spent makes the data better.
Use native graph mechanics, not bolted-on points. The structured graph is already a game a flat wiki can't be:
- Holes pull — the graph shows its own gaps (unsourced Facts, unconfirmed proposals, missing edges) and generates its own quests.
- Confirming is a swipe — the G-057 loop (AI proposes, human confirms / rejects / corrects) is a low-friction micro-game where the play is the data work (Entry 022 AI role split, Entry 021 Ingest Studio).
- Edges are combos — asserting a relationship nobody else spotted is a discovery, collectible in a way prose edits never are.
- Contradiction-hunting is a treasure hunt — Fact states (canon / speculation / contradiction) let QA masquerade as detective work.
- Consumption credentials you — Layer 10 consumption history unlocks the standing to verify Facts about what you've actually consumed.
Intrinsic over extrinsic — impact, not just credits. Points and credits (G-037) are extrinsic and can crowd out the intrinsic motivation that makes contribution durable (the overjustification effect). The primary mechanic is the impact echo — showing a contributor that their one confirmation cascaded ("now powers 14 pages / appeared in an answer-engine citation"), a move only the graph model can make. Mastery, autonomy, purpose, and recognition (G-051) lead; credits support.
Nothing ships on intuition — the A/B harness decides. OLN already runs Amplitude Experiment (via @amplitude/unified). Every gamification component ships behind a flag as a variant, and is kept only if it proves out paired with a data-quality guardrail metric (revert rate, contradiction rate, consensus disagreement) on a long horizon — not a day-one engagement spike.

Reasoning

The reframe that makes this safe is that gamifying a truth substrate is the inverse of gamifying a streak app: the obvious mechanic (reward output) destroys the asset. Constraining rewards to accuracy and resolution turns gamification from a risk to the moat into the engine that builds it.

The A/B system is what makes "make it fun" a tractable claim rather than a hope — but only with discipline. A variant that triples confirmations while doubling the later-correction rate is a loss the harness must be able to see, which means the guardrail (quality) events are as first-class as the engagement events. And because gamification spikes on novelty and the overjustification damage surfaces late, the readout horizon is 30–60 days with a holdout — including a pure- intrinsic arm to test whether points help or quietly hurt retention.

Open threads

Gamified contribution & the experiment program (G-059) — the component menu to test, the quality-guardrail event taxonomy, the consensus / anti-Goodhart confirmation design, and the measurement methodology (horizon, holdout, crowd-out arm).