# The Citation Drift Index (CDI): the metric that tells you if you're becoming a durable source in AI answers
Executive summary
Most teams measure "AI visibility" the way they measured early SEO wins: screenshots.
That's a trap.
In AI answers, the real question isn't "Did we get cited once?" It's: "Can we stay cited?"
So here's a standard you can actually operate:
- Citation Drift Index (CDI) → a drift-aware KPI for how stable your citations are over time
- A practical audit method (query set + cadence + controls)
- A diagnostic taxonomy that turns volatility into a backlog
If you already publish in a way that machines can reliably ingest (Bot-Visible Test (BVT)), and you've fixed provenance ambiguity (Entity Anchors), CDI becomes the KPI that tells you whether your authority is compounding.
For execution details, pair CDI with a concrete implementation checklist (The Citation-Ready GEO Checklist) and the system-level view of how mentions are assembled (The Citation Stack).
The contrarian claim
If you don't track drift, you're not doing GEO.
You're doing AI mention tourism.
One week your brand appears in an answer. Next week it's gone. Everyone celebrates anyway - because there's no metric that punishes volatility.
Traditional SEO trained us to obsess over position.
AI search requires us to obsess over selection stability.
What "citation drift" actually is
"Drift" is the delta between what you believe is true ("we're a source for this topic") and what the engines actually do over time.
Drift happens because:
- retrieval + ranking weights change
- answer synthesis patterns shift
- your pages change (or competitors ship better evidence)
- crawlers re-parse your pages and lose confidence (rendering/provenance issues)
Industry tooling has started naming this explicitly ("citation drift") and recommending multi-window measurement (daily/weekly/monthly). That's directionally correct - but most teams still don't have an operational KPI they can baseline, trend, and tie to actions.
CDI: the simplest usable definition
You want a metric that is:
- repeatable (same queries, same cadence)
- directional (improves as you become more reliable)
- actionable (when it moves, you know what to fix)
The definition (minimal and usable)
For a fixed set of queries Q, measured across a fixed set of engines E, at time t:
- Let C(t) be the set of queries where your brand is cited (as a link, named source, or clearly attributed reference).
Then over two measurement points (t0 → t1):
- Retention = |C(t0) ∩ C(t1)| / |C(t0)|
- CDI = 1 - Retention
Interpretation:
- CDI near 0.0 → stable authority (good)
- CDI near 1.0 → volatile / fragile visibility (bad)
Optional (only if you have the pipeline):
- Engine consistency: do multiple engines cite you for the same query?
- Misattribution rate: are you cited under the wrong brand/page?
Start with retention. Complexity can wait.
The audit method (that makes CDI real)
Here's the operating procedure. No dashboards required.
1) Build a query set you can defend
Aim for 30-100 queries.
Rules:
- 60-70% are "money queries" (core category, high intent)
- 20-30% are "proof queries" (comparisons, frameworks, standards)
- 10% are "edge queries" (hard cases that reveal weakness)
Store this list. Don't change it casually. If you do, version it.
2) Fix the engines and the prompt format
Pick 2-3 environments you care about (example):
- Google AI Overviews (where available)
- Perplexity
- ChatGPT (with browsing)
Use a consistent prompt template:
- ask for a concise answer
- ask for sources/citations
- keep query phrasing identical
3) Measure on a cadence that exposes volatility
- Weekly is the best default.
- Add a daily window only for launches (first 7-10 days).
- Add a monthly rollup for strategic reporting.
Drift-aware evaluation is a known problem in other AI evaluation settings too: if you only look at one snapshot, you'll fool yourself.
4) Add two controls so you don't hallucinate progress
- Competitor control: track 1-2 competitors for the same queries
- Neutral-source control: track whether engines cite "known stable" sources in your vertical
Why: if citations reshuffle across the whole category, you don't want to blame your content.
5) Classify every "drop" with a cause category
When CDI worsens, you need diagnosis. Use a simple taxonomy:
- T0: Technical extractability (fails BVT)
- T1: Provenance ambiguity (fails Entity Anchors)
- T2: Weak evidence (claims without references)
- T3: Entity mismatch (wrong page cited, wrong brand name)
- T4: Competitive displacement (someone shipped a better artifact)
This turns CDI into an engineering backlog.
A worked example (benchmark proxy)
Assume:
- Q = 50 queries
- E = 2 engines
- Week 1: you're cited on 14/50 queries → C(t0)=14
- Week 2: you're cited on 16/50 queries → C(t1)=16
- Overlap: 10 queries → |C(t0) ∩ C(t1)| = 10
Retention = 10/14 = 0.71
CDI = 1 - 0.71 = 0.29
How to read that:
- You didn't "lose visibility." You lost durability.
- 4 queries dropped. Your job is to classify the cause category for each drop.
Now compare:
- If a competitor's CDI is 0.10, they're harder to displace.
- If your CDI drops from 0.45 → 0.25 over 6 weeks, you're compounding.
CDI Audit Sheet: template & illustrative study
The template you can use
Below is a minimal CDI audit table. Copy it, fill it in weekly, and track retention.
| Query | Engine | Week 1 Status | Week 2 Status | Week 3 Status | Week 4 Status | Drift Trigger |
| ------- | -------- | --------------- | --------------- | --------------- | --------------- | --------------- |
| [your query] | [engine] | Cited / Not Cited | Cited / Not Cited | Cited / Not Cited | Cited / Not Cited | [T0/T1/T2/T3/T4 or —] |
Column guide:
- Query: the exact phrasing you tested
- Engine: which AI environment (Perplexity, ChatGPT, Google AI Overviews, etc.)
- Week X Status: "Cited" (you appeared as a source), "Not Cited" (you didn't), or "Misattributed" (wrong brand/page)
- Drift Trigger: when a citation drops, classify the cause (T0 = technical, T1 = provenance, T2 = weak evidence, T3 = entity mismatch, T4 = competitive displacement)
Illustrative baseline study (Week 1 snapshot)
Note: The data below is an illustrative example based on observed industry patterns. It is not a live case study, but reflects typical citation volatility across engines.
We tracked 10 representative queries across 2 engines (ChatGPT with browsing, Perplexity). Results below show citation status for a hypothetical e-commerce brand publishing product guides + category content.
| Query | Engine | Week 1 | Week 2 | Week 3 | Week 4 | Drift Trigger |
| ------- | -------- | -------- | -------- | -------- | -------- | --------------- |
| best sustainable running shoes 2025 | ChatGPT | Cited | Cited | Cited | Cited | — |
| best sustainable running shoes 2025 | Perplexity | Cited | Not Cited | Cited | Cited | T4 (competitor launched guide) |
| how to choose running shoes for flat feet | ChatGPT | Cited | Cited | Cited | Not Cited | T2 (no references in answer) |
| how to choose running shoes for flat feet | Perplexity | Not Cited | Not Cited | Not Cited | Not Cited | T0 (JS render issue) |
| trail running shoes vs hiking boots | ChatGPT | Cited | Misattributed | Cited | Cited | T3 (wrong URL cited week 2) |
| trail running shoes vs hiking boots | Perplexity | Cited | Cited | Cited | Cited | — |
| waterproof running shoes comparison 2025 | ChatGPT | Not Cited | Cited | Cited | Cited | — (gained week 2) |
| waterproof running shoes comparison 2025 | Perplexity | Cited | Cited | Not Cited | Cited | T4 (competitor artifact) |
| running shoe size guide by brand | ChatGPT | Cited | Cited | Cited | Cited | — |
| running shoe size guide by brand | Perplexity | Cited | Cited | Cited | Not Cited | T1 (provenance ambiguity) |
| carbon plate running shoes pros and cons | ChatGPT | Not Cited | Not Cited | Not Cited | Not Cited | T2 (weak evidence) |
| carbon plate running shoes pros and cons | Perplexity | Not Cited | Cited | Cited | Cited | — (gained week 2) |
| best running shoes for marathon training | ChatGPT | Cited | Cited | Not Cited | Cited | T3 (entity mismatch) |
| best running shoes for marathon training | Perplexity | Cited | Cited | Cited | Cited | — |
| cushioned vs minimalist running shoes | ChatGPT | Cited | Not Cited | Not Cited | Cited | T2 (evidence gap) |
| cushioned vs minimalist running shoes | Perplexity | Cited | Cited | Cited | Cited | — |
| running shoes for wide feet recommendations | ChatGPT | Not Cited | Not Cited | Cited | Cited | — (gained week 3) |
| running shoes for wide feet recommendations | Perplexity | Not Cited | Not Cited | Not Cited | Cited | — (gained week 4) |
| how long do running shoes last | ChatGPT | Cited | Cited | Cited | Cited | — |
| how long do running shoes last | Perplexity | Cited | Cited | Cited | Cited | — |
Summary stats (Week 1 → Week 4):
- Week 1: Cited on 11/20 queries (55%)
- Week 4: Cited on 15/20 queries (75%)
- Retention (Week 1 → Week 2): 7/11 = 0.64 → CDI = 0.36
- Retention (Week 3 → Week 4): 13/15 = 0.87 → CDI = 0.13
What this tells you:
- Early volatility (CDI 0.36) → signals weak evidence or technical issues
- Late stability (CDI 0.13) → improvements compounding
- Perplexity more stable than ChatGPT for this query set
- Drift triggers cluster around T2 (weak evidence) and T4 (competitive displacement)
Diagnosis backlog:
- Fix T0 issues (JS render for "flat feet" query)
- Strengthen T2 pages (add references, data tables)
- Monitor T4 competitive moves (track what guides competitors launched)
- Resolve T1/T3 provenance issues (Entity Anchors audit)
This is the operational loop. Track → diagnose → fix → re-measure.
What actually lowers CDI (and what doesn't)
What lowers CDI
1) Technical honesty (BVT pass)
If engines can't reliably ingest your page, citations will be unstable by definition.
2) Provenance clarity (Entity Anchors)
Ambiguous publisher/author/dates increase misattribution and reduce trust.
3) Evidence density (with references)
Citations stick when your page is:
- specific
- attributable
- supported
4) Category language consistency
Avoid fragmenting your model. Reuse terminology and strengthen clusters.
5) System design (Citation Stack thinking)
Mentions are assembled across sources. Isolated one-off pages create fragile exposure.
What does not lower CDI (long-term)
- prompt tweaks
- "LLM-friendly writing" without evidence and provenance
- viral one-offs without a system
How GEOOptimizer measures/operationalizes this
In GEOOptimizer, CDI becomes a productized audit loop:
- Maintain a versioned query set per category
- Run scheduled checks (weekly + launch windows)
- Extract:
- whether you were cited
- which URL was cited
- whether competitors were cited instead
- misattribution signals (wrong brand / wrong page)
- Compute:
- CDI (retention)
- engine consistency
- misattribution rate
- Auto-diagnose using prerequisites:
- BVT failures (bot-empty pages)
- missing/weak Entity Anchors
- evidence gaps (claims without references)
And yes - discovery helpers like llms.txt can support the program.
But CDI keeps the team honest: did it make visibility durable?
The standard you should adopt
If you want one rule to align the team:
No weekly CDI report → no GEO roadmap.
Because if you can't measure stability, you can't tell whether your authority is compounding - or evaporating.
References
- Schema.org - Article: https://schema.org/Article
- Google Search Central - JavaScript SEO: https://developers.google.com/search/docs/crawling-indexing/javascript
- Google Search Central - Structured data guidelines: https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data
- AirOps - Citation drift in AI search: https://www.airops.com/ai-search-hub/how-to-measure-and-manage-citation-drift-in-ai-search
- GrowByData - Citation tracking in the age of AI: https://growbydata.com/understanding-citation-tracking-in-the-age-of-ai/
- Yotpo - llms.txt guide (industry mainstreaming): https://www.yotpo.com/blog/what-is-llms-txt/
- arXiv - drift-aware evaluation motivation: https://arxiv.org/abs/2511.04964



