Citation Drift Index (CDI): Metric for Durable AI Citations

# The Citation Drift Index (CDI): the metric that tells you if you're becoming a durable source in AI answers

Executive summary

Most teams measure "AI visibility" the way they measured early SEO wins: screenshots.

That's a trap.

In AI answers, the real question isn't "Did we get cited once?" It's: "Can we stay cited?"

So here's a standard you can actually operate:

Citation Drift Index (CDI) → a drift-aware KPI for how stable your citations are over time
A practical audit method (query set + cadence + controls)
A diagnostic taxonomy that turns volatility into a backlog

If you already publish in a way that machines can reliably ingest (Bot-Visible Test (BVT)), and you've fixed provenance ambiguity (Entity Anchors), CDI becomes the KPI that tells you whether your authority is compounding.

For execution details, pair CDI with a concrete implementation checklist (The Citation-Ready GEO Checklist) and the system-level view of how mentions are assembled (The Citation Stack).

The contrarian claim

If you don't track drift, you're not doing GEO.

You're doing AI mention tourism.

One week your brand appears in an answer. Next week it's gone. Everyone celebrates anyway - because there's no metric that punishes volatility.

Traditional SEO trained us to obsess over position.

AI search requires us to obsess over selection stability.

What "citation drift" actually is

"Drift" is the delta between what you believe is true ("we're a source for this topic") and what the engines actually do over time.

Drift happens because:

retrieval + ranking weights change
answer synthesis patterns shift
your pages change (or competitors ship better evidence)
crawlers re-parse your pages and lose confidence (rendering/provenance issues)

Industry tooling has started naming this explicitly ("citation drift") and recommending multi-window measurement (daily/weekly/monthly). That's directionally correct - but most teams still don't have an operational KPI they can baseline, trend, and tie to actions.

CDI: the simplest usable definition

You want a metric that is:

repeatable (same queries, same cadence)
directional (improves as you become more reliable)
actionable (when it moves, you know what to fix)

The definition (minimal and usable)

For a fixed set of queries Q, measured across a fixed set of engines E, at time t:

Let C(t) be the set of queries where your brand is cited (as a link, named source, or clearly attributed reference).

Then over two measurement points (t0 → t1):

Retention = |C(t0) ∩ C(t1)| / |C(t0)|
CDI = 1 - Retention

Interpretation:

CDI near 0.0 → stable authority (good)
CDI near 1.0 → volatile / fragile visibility (bad)

Optional (only if you have the pipeline):

Engine consistency: do multiple engines cite you for the same query?
Misattribution rate: are you cited under the wrong brand/page?

Start with retention. Complexity can wait.

The audit method (that makes CDI real)

Here's the operating procedure. No dashboards required.

1) Build a query set you can defend

Aim for 30-100 queries.

Rules:

60-70% are "money queries" (core category, high intent)
20-30% are "proof queries" (comparisons, frameworks, standards)
10% are "edge queries" (hard cases that reveal weakness)

Store this list. Don't change it casually. If you do, version it.

2) Fix the engines and the prompt format

Pick 2-3 environments you care about (example):

Google AI Overviews (where available)
Perplexity
ChatGPT (with browsing)

Use a consistent prompt template:

ask for a concise answer
ask for sources/citations
keep query phrasing identical

3) Measure on a cadence that exposes volatility

Weekly is the best default.
Add a daily window only for launches (first 7-10 days).
Add a monthly rollup for strategic reporting.

Drift-aware evaluation is a known problem in other AI evaluation settings too: if you only look at one snapshot, you'll fool yourself.

4) Add two controls so you don't hallucinate progress

Competitor control: track 1-2 competitors for the same queries
Neutral-source control: track whether engines cite "known stable" sources in your vertical

Why: if citations reshuffle across the whole category, you don't want to blame your content.

5) Classify every "drop" with a cause category

When CDI worsens, you need diagnosis. Use a simple taxonomy:

T0: Technical extractability (fails BVT)
T1: Provenance ambiguity (fails Entity Anchors)
T2: Weak evidence (claims without references)
T3: Entity mismatch (wrong page cited, wrong brand name)
T4: Competitive displacement (someone shipped a better artifact)

This turns CDI into an engineering backlog.

A worked example (benchmark proxy)

Assume:

Q = 50 queries
E = 2 engines
Week 1: you're cited on 14/50 queries → C(t0)=14
Week 2: you're cited on 16/50 queries → C(t1)=16
Overlap: 10 queries → |C(t0) ∩ C(t1)| = 10

Retention = 10/14 = 0.71

CDI = 1 - 0.71 = 0.29

How to read that:

You didn't "lose visibility." You lost durability.
4 queries dropped. Your job is to classify the cause category for each drop.

Now compare:

If a competitor's CDI is 0.10, they're harder to displace.
If your CDI drops from 0.45 → 0.25 over 6 weeks, you're compounding.

CDI Audit Sheet: template & illustrative study

The template you can use

Below is a minimal CDI audit table. Copy it, fill it in weekly, and track retention.

Query	Engine	Week 1 Status	Week 2 Status	Week 3 Status	Week 4 Status	Drift Trigger
-------	--------	---------------	---------------	---------------	---------------	---------------
[your query]	[engine]	Cited / Not Cited	Cited / Not Cited	Cited / Not Cited	Cited / Not Cited	[T0/T1/T2/T3/T4 or —]

Column guide:

Query: the exact phrasing you tested
Engine: which AI environment (Perplexity, ChatGPT, Google AI Overviews, etc.)
Week X Status: "Cited" (you appeared as a source), "Not Cited" (you didn't), or "Misattributed" (wrong brand/page)
Drift Trigger: when a citation drops, classify the cause (T0 = technical, T1 = provenance, T2 = weak evidence, T3 = entity mismatch, T4 = competitive displacement)

Illustrative baseline study (Week 1 snapshot)

Note: The data below is an illustrative example based on observed industry patterns. It is not a live case study, but reflects typical citation volatility across engines.

We tracked 10 representative queries across 2 engines (ChatGPT with browsing, Perplexity). Results below show citation status for a hypothetical e-commerce brand publishing product guides + category content.

Query	Engine	Week 1	Week 2	Week 3	Week 4	Drift Trigger
-------	--------	--------	--------	--------	--------	---------------
best sustainable running shoes 2025	ChatGPT	Cited	Cited	Cited	Cited	—
best sustainable running shoes 2025	Perplexity	Cited	Not Cited	Cited	Cited	T4 (competitor launched guide)
how to choose running shoes for flat feet	ChatGPT	Cited	Cited	Cited	Not Cited	T2 (no references in answer)
how to choose running shoes for flat feet	Perplexity	Not Cited	Not Cited	Not Cited	Not Cited	T0 (JS render issue)
trail running shoes vs hiking boots	ChatGPT	Cited	Misattributed	Cited	Cited	T3 (wrong URL cited week 2)
trail running shoes vs hiking boots	Perplexity	Cited	Cited	Cited	Cited	—
waterproof running shoes comparison 2025	ChatGPT	Not Cited	Cited	Cited	Cited	— (gained week 2)
waterproof running shoes comparison 2025	Perplexity	Cited	Cited	Not Cited	Cited	T4 (competitor artifact)
running shoe size guide by brand	ChatGPT	Cited	Cited	Cited	Cited	—
running shoe size guide by brand	Perplexity	Cited	Cited	Cited	Not Cited	T1 (provenance ambiguity)
carbon plate running shoes pros and cons	ChatGPT	Not Cited	Not Cited	Not Cited	Not Cited	T2 (weak evidence)
carbon plate running shoes pros and cons	Perplexity	Not Cited	Cited	Cited	Cited	— (gained week 2)
best running shoes for marathon training	ChatGPT	Cited	Cited	Not Cited	Cited	T3 (entity mismatch)
best running shoes for marathon training	Perplexity	Cited	Cited	Cited	Cited	—
cushioned vs minimalist running shoes	ChatGPT	Cited	Not Cited	Not Cited	Cited	T2 (evidence gap)
cushioned vs minimalist running shoes	Perplexity	Cited	Cited	Cited	Cited	—
running shoes for wide feet recommendations	ChatGPT	Not Cited	Not Cited	Cited	Cited	— (gained week 3)
running shoes for wide feet recommendations	Perplexity	Not Cited	Not Cited	Not Cited	Cited	— (gained week 4)
how long do running shoes last	ChatGPT	Cited	Cited	Cited	Cited	—
how long do running shoes last	Perplexity	Cited	Cited	Cited	Cited	—

Summary stats (Week 1 → Week 4):

Week 1: Cited on 11/20 queries (55%)
Week 4: Cited on 15/20 queries (75%)
Retention (Week 1 → Week 2): 7/11 = 0.64 → CDI = 0.36
Retention (Week 3 → Week 4): 13/15 = 0.87 → CDI = 0.13

What this tells you:

Early volatility (CDI 0.36) → signals weak evidence or technical issues
Late stability (CDI 0.13) → improvements compounding
Perplexity more stable than ChatGPT for this query set
Drift triggers cluster around T2 (weak evidence) and T4 (competitive displacement)

Diagnosis backlog:

Fix T0 issues (JS render for "flat feet" query)
Strengthen T2 pages (add references, data tables)
Monitor T4 competitive moves (track what guides competitors launched)
Resolve T1/T3 provenance issues (Entity Anchors audit)

This is the operational loop. Track → diagnose → fix → re-measure.

What actually lowers CDI (and what doesn't)

What lowers CDI

1) Technical honesty (BVT pass)

If engines can't reliably ingest your page, citations will be unstable by definition.

2) Provenance clarity (Entity Anchors)

Ambiguous publisher/author/dates increase misattribution and reduce trust.

3) Evidence density (with references)

Citations stick when your page is:

specific
attributable
supported

4) Category language consistency

Avoid fragmenting your model. Reuse terminology and strengthen clusters.

5) System design (Citation Stack thinking)

Mentions are assembled across sources. Isolated one-off pages create fragile exposure.

What does not lower CDI (long-term)

prompt tweaks
"LLM-friendly writing" without evidence and provenance
viral one-offs without a system

How GEOOptimizer measures/operationalizes this

In GEOOptimizer, CDI becomes a productized audit loop:

Maintain a versioned query set per category
Run scheduled checks (weekly + launch windows)
Extract:

- whether you were cited

- which URL was cited

- whether competitors were cited instead

- misattribution signals (wrong brand / wrong page)

Compute:

- CDI (retention)

- engine consistency

- misattribution rate

Auto-diagnose using prerequisites:

- BVT failures (bot-empty pages)

- missing/weak Entity Anchors

- evidence gaps (claims without references)

And yes - discovery helpers like llms.txt can support the program.

But CDI keeps the team honest: did it make visibility durable?

The standard you should adopt

If you want one rule to align the team:

No weekly CDI report → no GEO roadmap.

Because if you can't measure stability, you can't tell whether your authority is compounding - or evaporating.

References

Schema.org - Article: https://schema.org/Article
Google Search Central - JavaScript SEO: https://developers.google.com/search/docs/crawling-indexing/javascript
Google Search Central - Structured data guidelines: https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data
AirOps - Citation drift in AI search: https://www.airops.com/ai-search-hub/how-to-measure-and-manage-citation-drift-in-ai-search
GrowByData - Citation tracking in the age of AI: https://growbydata.com/understanding-citation-tracking-in-the-age-of-ai/
Yotpo - llms.txt guide (industry mainstreaming): https://www.yotpo.com/blog/what-is-llms-txt/
arXiv - drift-aware evaluation motivation: https://arxiv.org/abs/2511.04964

The Citation Drift Index (CDI): the metric for durable AI citations