



Sources: Wikipedia MediaWiki Action API, Wikipedia Vital Articles / Level 4
Tools: Bruin cli, BigQuery, Bruin dac
Methodology
Universe. Two tiers (14,004 articles total, 11 top-level subjects, 110 sub-subjects). Tier 1: Wikipedia Vital Articles / Level 4 – 9,907 curated articles across all 11 subjects. Tier 2: 4,097 WikiProject Top/High-importance articles from Companies, Brands, Computing, Internet culture, and Business – added only to Society and social sciences (+2,735) and Technology (+1,362) to compensate for those areas being under-represented in Vital L4. Vital takes priority on collision.
AI seed list. 48 curated AI-topic articles spanning foundations (Artificial intelligence, Machine learning, Neural network, Deep learning, Supervised/Unsupervised/Self-supervised learning), architectures (Transformer, CNN, RNN, GAN, Diffusion model, Attention, LSTM), modern systems (LLM, GPT-3, GPT-4, ChatGPT, Claude, Gemini, LLaMA, BERT, Stable Diffusion, DALL-E, Midjourney, Generative AI, Foundation model), companies (OpenAI, Anthropic, DeepMind, Hugging Face), sub-fields (NLP, Computer vision, RL, Speech recognition, Symbolic AI, Machine translation, Robotics, Expert system), and cultural/policy (AI alignment, safety, ethics, AGI, existential risk, technological singularity, regulation, AI winter). Each canonical title is expanded with its current redirect aliases.
Snapshots. 14 semiannual snapshots at fixed dates (December 1 and May 1, Dec-2019 through May-2026). For each (article × date), the MediaWiki Action API returns the closest revision at or before the target date; body wikilinks (regex-extracted from wikitext, excluding namespace, self, and anchor links) are intersected with the AI alias list to count "AI references".
Pipeline. Raw scrapes -> staging joins -> subject/sub-subject/article aggregates. This dashboard queries staging.wat_ai_reference_counts directly. All assets run via Bruin cli on BigQuery; the dashboard renders via Bruin dac.
Limitations & caveats
Slicing & filtering. Gainer charts rank by absolute percentage-point gain since Dec 2019, not relative growth; the sub-subject chart shows the top 8 only. Both gainer charts and every small-multiples panel apply the same eligibility filter: n>=20 articles AND >=1 AI-referencing article at the latest snapshot. The 20-article floor avoids small-denominator noise (e.g. a 2-article sub-subject swinging to 50% on a single edit). Small-multiples panels show up to 7 sub-subjects (top by article count); panels with sparse AI uptake show fewer (History 2; Everyday life and Geography 3; Mathematics 4; Arts and Physical sciences 5; Biology & health 6).
Comparability. In the small-multiples grid, per-panel y-ranges are independent – compare shapes, not heights. The universe is not uniform across subjects: only Society and social sciences and Technology receive the WikiProject Top/High extension; the other 9 subjects are Vital L4 only. Cross-subject magnitudes therefore reflect both AI uptake AND uneven corpus composition.
What "AI reference" means. A structural body wikilink to one of 48 curated AI articles (plus current redirect aliases), not a semantic measure of AI content. Template-generated and navbox links are excluded; only editor-chosen body links count.
Scope. Universe is curated (Vital L4 + WikiProject Top/High in 5 categories = 14,004 articles), not a random or exhaustive sample of Wikipedia. English Wikipedia only. Results generalise to "important, well-edited articles", not to long-tail content.
Time. Some AI seed pages did not exist in 2019 (e.g. ChatGPT, GPT-4, Claude, Gemini, LLaMA, Stable Diffusion, Midjourney), so apparent growth partly reflects new AI vocabulary entering Wikipedia rather than only existing articles adopting new links. Snapshots are semiannual (Dec 1 / May 1), so spikes shorter than ~6 months and revisions reverted between snapshots are invisible. The MediaWiki API returns the closest revision at or before each snapshot date, so an article's state can be up to ~6 months stale relative to the next snapshot.
by uncertainschrodinger
1 Comment
Sources: Wikipedia MediaWiki Action API, Wikipedia Vital Articles / Level 4
Tools: Bruin cli, BigQuery, Bruin dac