What 21,000 AI Citations Reveal About Generative Engine Optimization

A recent study of 21,143 AI citations across ChatGPT, Google AI Overview, and Perplexity shows that depth of absorption matters more than sheer source count. The findings line up with the author’s own Generative Engine Optimization (GEO) efforts and suggest concrete steps for creators who want their content to be genuinely used by large language models.

Introduction

A research team recently released a dataset of 21,143 valid citations collected from 602 experimental prompts run on three major AI platforms: ChatGPT, Google AI Overview, and Perplexity. The goal was to distinguish between pages that merely appear in a model’s source list and pages that are actually absorbed into the model’s generated answer. The results map directly onto the set of optimizations I have been building on my own blog, shinobis.com, using vanilla PHP, JSON‑LD, and a markdown‑only delivery mode for agents.

How the study measured influence

The researchers defined two layers:

Search layer – which prompts trigger a web‑search, how many sources are consulted, and which domains surface most often.
Influence layer – for each cited page, an influence_score combines citation frequency, position in the source list, paragraph coverage, and semantic similarity to the final response. A high score means the model actually incorporated content from that page rather than just listing it.

This distinction matters because a page that shows up in a bibliography but contributes no text to the answer is essentially invisible to the end‑user.

Key platform differences

Platform	Avg. sources per prompt	Avg. influence per citation
ChatGPT	6.88	0.2713
Google AI Overview	12.06	0.0584
Perplexity	16.35	0.0646

ChatGPT cites fewer sites but each citation carries roughly 4.6× more weight than a Google citation. Google and Perplexity cast a wider net but each source is treated more superficially.

What makes a page deeply absorbed?

The study broke cited pages into influence quartiles. The top 25 % of pages averaged 1,943 words, 10.59 headings, and 47 paragraphs, while the bottom 25 % averaged only 170 words, under one heading, and eight paragraphs. The numbers tell a clear story:

Length matters, but only up to a point. Influence rises sharply up to about 3,000 words and then plateaus.
Structural density – many headings, lists, and short paragraphs – gives the model more “hooks” to extract facts.
Content signals – numbers, definitions, comparisons, and step‑by‑step instructions – boost influence by 41 %‑62 %.
Q&A format actually hurts influence, dropping it by roughly 6 %.

The four multipliers

Feature	Influence lift
Numerical data & statistics	+61.55 %
Clear definitions	+57.33 %
Structured comparisons	+55.28 %
How‑to steps	+41.20 %
Q&A format	–5.74 %

These percentages come from regression models that control for length and domain authority, so they reflect genuine content quality signals.

How my blog already aligns with the data

GEO tactic	Study insight it matches
Direct opening statements (no narrative preamble)	Semantic alignment is the strongest predictor of influence (correlation 0.43)
JSON‑LD abstract field populated from the excerpt	Agents read the abstract first, deciding whether to ingest the full article
Automatic knowledge‑graph entities (about, mentions, relatedLink, citation)	Defined structure and explicit relationships raise absorption
Markdown‑only response for `text/markdown` agents	Clean, noise‑free content improves processing efficiency
Trilingual publishing (English, Spanish, Japanese)	English dominates citations (≈ 90 %), but multilingual content serves niche audiences without hurting the English‑language influence

In practice, my posts sit in the 1,200‑2,500‑word sweet spot, contain 6‑10 H2 sections, and open each section with a concise definition. The data validates those choices.

What the citation ecosystem looks like

Across all three platforms, official sites, news outlets, and industry verticals account for 79 %‑87 % of citations. The 15 most‑cited domains include Wikipedia, YouTube, Reuters, and LinkedIn – sites with massive authority. However, the study shows that frequency does not equal influence. News articles appear often but have lower average influence than encyclopedia‑style pages that provide dense definitions and structured data.

Platform‑specific strategies

ChatGPT – rewards deep semantic relevance. Pages that blend definitions, data, and narrative flow score highest.
Google AI Overview – values tight title‑to‑question alignment and clear headings. Matching the user’s phrasing in the H1/H2 tags is critical.
Perplexity – prefers modular content that can be broken into independent fragments. A page with many self‑contained subsections performs best.

My current content model already satisfies ChatGPT and Google requirements, and the multiple H2 sections give Perplexity the modularity it likes.

Adjustments prompted by the study

Add more hard numbers – every article will now include at least one concrete statistic or metric, not just illustrative examples.
Explicit side‑by‑side comparisons – when discussing tools or frameworks, I will structure the content as a table of features, performance, and cost.
Definition‑first subsections – each H2 will start with a one‑sentence definition before expanding into why it matters or how to use it.

These changes target the highest multipliers identified in the research.

The broader implication

The authors conclude that, in the era of AI‑augmented search, the most valuable content is not the one that simply appears in a source list, but the one that can be decomposed into definitions, numbers, comparisons, and actionable steps. That is essentially the classic “good writing” checklist, now backed by a dataset of over twenty‑one thousand citations.

Where to read the full study

The complete paper and raw dataset are hosted by the GEO Citation Lab: https://geo‑citation‑lab.org

If you’re building a site that hopes to be referenced by large language models, the takeaway is clear: focus on depth, structure, and evidence. Quantity of citations is a secondary metric; influence per citation is what drives real visibility.

#Generative AI #content optimization #Large Language Models #SEO #Data analysis