Operationalize the "gamify quality, not volume" decision (Entry 031): the gold-salting + reliability-weighting spine that decouples fun from authority, the component catalog (each as proximate-win + quality guardrail), the newcomer practice ramp, and the measurement methodology.
Why this matters
Entry 031 decides the principles; this entry drafts the mechanism that makes "gamify quality, not volume" real, plus the components to A/B test.
The spine: decouple fun from authority
Borrowed from reCAPTCHA / professional crowd-labeling. Seed the queue (G-060) with
gold-standard cards whose answer is already known — gold-true (from aged,
high-tier, never-contested Canon) and gold-false (from the retcon log / corrupted
Facts) — indistinguishable from live cards and self-replenishing from the graph
itself. A reviewer's hit-rate on gold becomes a reliability score, which both
weights their votes on live cards and gates their rewards.
This dissolves Goodhart: fun is unlimited, authority is earned. Spam-confirming just tanks your gold accuracy, so your votes stop counting and rewards dry up — you cannot degrade the graph by playing fast, because fast-and-wrong de-weights itself. Four rules hang off the spine:
- Blind voting — no visibility of others' votes/tally before you commit (kills bandwagon cascades).
- Reward matched-consensus, never participation — you earn when your call matches the eventual graduated outcome; rubber-stamping the obvious pays ~nothing.
- Vesting on graduation — rewards vest only when the Fact reaches Canon and survives a challenge window.
- Asymmetric stakes — a wrong call on a gold card costs more reliability than a Skip. Confidence is taxed; humility is free.
Component catalog
Each ships behind a G-064 flag with a proximate metric and a quality guardrail (Entry 031):
| Component | Mechanic | Buys (quality behavior) | Guardrail | Metric pair (proximate ▸ guardrail) |
|---|---|---|---|---|
| Confirmation Swipe | Tinder-for-Facts card stack | Provisional→Canonical throughput | gold salting + reliability weighting | confirms/session ▸ gold-accuracy & 30-day revert |
| Reliability score | a trust meter, not points | accuracy becomes the currency | decays if accuracy slips | median reliability ▸ graph revert rate |
| Confidence-weighted bounties | queue surfaces "worth more" cards | effort to where quality is weakest | bounty cards need higher N | contested-Fact resolution ▸ accuracy on them |
| Canon Detective | contradiction treasure-hunt | surfaces latent defects | false reports cost reliability | confirmed contradictions ▸ false-positive rate |
| Completion meter / gap quests | "64% sourced"; graph emits its holes | completeness | counts only confirmed Facts | gap-closure ▸ revert rate of gap-fills |
| First-Finder credit | permanent attribution (Layer 9) | net-new valid edges, recognition | vests on graduation + survival | net-new edges ▸ spurious rate |
| Calibrated streak | streak counts only above-accuracy days | sustained engagement, no farming | breaks on accuracy drop, not missed volume | return rate ▸ accuracy maintenance |
| Impact echo | "your confirms now power 14 pages; 1 cited" | the intrinsic capstone | none — celebrates real value | 7/30-day retention ▸ — |
Newcomer ramp (practice mode)
Reliability scoring is punitive to newcomers unless they can earn a first score safely. A practice mode of already-settled cards with instant right/wrong feedback and no live consequence is both the onboarding tutorial (closes the unspecced onboarding ramp, cf. G-007) and where first reliability is earned.
Open decisions
- Gold-card mint/rotation cadence (anti-memorization); the reliability→weight curve.
- Reward currency: relation to Credits/Power (G-037) vs. pure standing (G-051).
- Measurement: 30–60 day horizons, holdout, and a pure-intrinsic arm to detect overjustification crowd-out (Entry 031), all via G-064.
Related
- Entry 031 — deciding principles · Entry 030 — citability the quality protects
- G-057 — authoring loop · G-060 — the review queue this rides on
- G-026 — data quality (guardrail) · G-051 — recognition · G-036 — reactions
- G-037 — credits (extrinsic layer) · G-058 — where impact-echo's "cited" lands
- G-064 — experiment harness · G-007 — onboarding (practice ramp)