Bot-Visible Test (BVT): Predict if LLMs Will Cite You

Executive summary

Most GEO advice starts with write better answers.

That's not the first bottleneck.

If a crawler fetches your URL and gets an empty app shell (generic title tag, generic meta description, no article text, no author/dates), you are not "optimizing." You are not publishing — at least not to systems that don't fully execute JavaScript.

I'll prove the failure mode with a simple fetch test, then give you a pass/fail standard you can run in 5 minutes:

BVT (Bot‑Visible Test) → a pragmatic standard for "is this page real to crawlers?"
BVCR (Bot‑Visible Content Ratio) → a scoring lens so teams can track improvement
A remediation hierarchy (SSR/SSG, then dynamic rendering, then head-only band-aids)

Once you pass BVT, then it's worth doing the higher-leverage GEO work: provenance (Entity Anchors), extraction design (Answer‑First Content), and system-level mention engineering (The Citation Stack).

The contrarian claim (and why it's true)

A lot of "GEO" doesn't fail because the content is weak.

It fails because the page is bot-empty.

Modern sites (especially SPAs) often return:

a generic HTML shell,
a JS bundle,
and the actual article only after client-side rendering.

Search systems have long documented that JavaScript rendering can be delayed, conditional, or skipped in certain situations (for example: rendering decisions that can be affected by directives like noindex, and the general complexity/risk of JS-heavy delivery). Practitioner breakdowns that track Google's documented behavior make this operationally obvious (see: Search Engine Journal coverage of Google's clarifications and the supporting ecosystem of JS SEO diagnostics) [1][2][3].

If a major search engine needs special care to reliably ingest JS sites, assume AI crawlers are at least as brittle.

BVT is the sanity check that prevents you from wasting months "optimizing" pages that your target systems never truly received.

BVT (Bot‑Visible Test): pass/fail standard

BVT is intentionally simple: no tools, no dashboards.

Pass (minimum viable): in the initial HTML response (no JS execution), the crawler can extract:

1) a page title that matches the article

2) a headline (H1) or close variant

3) a meaningful chunk of article body text

4) basic provenance signals: author/publisher + publish/modified dates (as visible text or JSON‑LD)

If you fail any of these, your citations will be inconsistent at best.

BVCR: Bot‑Visible Content Ratio

A useful scoring lens so teams can improve iteratively:

BVCR = bot-visible meaningful words / human-visible meaningful words

Interpretation:

0–20% (FAIL): you're effectively not publishing to non-rendering crawlers.
20–80% (RISK): partial extraction; citations may be missing or wrong.
80–100% (PASS): now you can compete on authority, evidence, and entity clarity.

BVCR doesn't need to be perfect to be useful. It only needs to tell you: "Are we shipping bot-empty pages?"

Measured example: the "generic head" symptom

Here is the fastest possible BVT.

Command to run:

curl -A 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36' -L 'https://example.com/blog/some-article' | head -n 20

In a real test we ran on a live SPA blog route, the initial HTML returned:

title tag: a generic product/platform title (not the article title)
meta description tag: a generic homepage-style description

That's a classic SPA pattern: the server delivers a generic shell; the real article (title/body/provenance) only appears after JavaScript runs.

Why this matters:

A crawler that doesn't execute JS will associate this URL with generic metadata.
A crawler that sometimes executes JS will produce inconsistent extraction and attribution.

This is the silent killer of GEO programs: teams improve content quality, but the delivery layer prevents ingestion.

How to run BVT on any page (5 minutes)

Step 1 — Fetch the first HTML response

Run this command:

curl -s -A 'Mozilla/5.0' -L 'https://yourdomain.com/your-article' | sed -n '1,200p'

What you want to see:

a specific title tag
a specific meta description
the H1/headline
real paragraphs from the article body

What failing looks like:

mostly scripts + div placeholders
"App", "Platform", or other generic title/description
no identifiable article text

Step 2 — Confirm your headline exists in the HTML

Run:

curl -s -A 'Mozilla/5.0' -L 'https://yourdomain.com/your-article' | grep -i -n 'your headline' | head

No match? BVCR is likely near zero.

Step 3 — Check for structured provenance

You don't need to overthink schema, but you do need the basics.

curl -s -A 'Mozilla/5.0' -L 'https://yourdomain.com/your-article' | grep -i 'application/ld+json' | head

Then validate the JSON‑LD includes (at minimum):

@type: Article (or BlogPosting / NewsArticle)
headline
author and publisher
datePublished and/or dateModified

Schema.org's Article type documents the baseline vocabulary and strongly implies what "machine readable article body" means (e.g., articleBody property) [4].

Remediation hierarchy (what actually fixes BVT)

If BVT fails, you don't need "more GEO." You need a more honest publishing surface.

1) SSR or SSG for content routes (recommended)

If you can: make blog pages server-side rendered or statically generated.

This yields:

stable HTML,
stable metadata,
consistent extraction.

2) Pre-render critical routes (pragmatic)

If full SSR is hard, pre-render just:

top posts
money pages
pillar pages

3) Dynamic rendering (last resort)

Dynamic rendering can be a bridge, not a foundation.

4) Head-only band-aids (better than nothing)

If you can't fix rendering yet, at minimum fix "head fidelity":

correct title tag and meta description
canonical URL
OpenGraph/Twitter tags

This may reduce misattribution, but it's still fragile without body content.

Where Entity Anchors and the Citation Stack belong (after BVT passes)

Once Stage 0 (bot-visible HTML) is solved, the leverage shifts:

Provenance / identity → Entity Anchors
Extractable answers → Answer‑First Content
System design (mentions are assembled across sources) → The Citation Stack
Discovery / assistive indexing → llms.txt (useful, but not a substitute for BVT)

If you want one mental model:

BVT is the gate. Entity Anchors are the lock. Evidence is the currency.

How GEOOptimizer operationalizes this

In GEOOptimizer, BVT becomes an automated check:

Fetch the page as a non-JS client.
Extract: title, meta description, H1/H2s, visible text, JSON‑LD.
Compute a BVCR proxy (bot-visible word count + headline/provenance presence).
Flag failures with concrete actions:

- "Generic head" symptom

- Missing Article/BlogPosting JSON‑LD

- Missing author/publisher/dates

The goal is not score theater.

The goal is to stop shipping pages that look like nothing to the systems you're trying to influence.

The standard you should adopt

If you want a hard rule to align the team:

No BVT pass → no GEO sprint.

Because you can't optimize what the crawler never received.

References

[1] Search Engine Journal — Google warns noindex can block JavaScript from running (coverage of Google JS SEO clarifications): https://www.searchenginejournal.com/google-warns-noindex-can-block-javascript-from-running/563333/

[2] Prerender.io — Understanding Google noindex rendering: https://prerender.io/blog/understanding-google-noindex-rendering/

[3] Sitebulb — How JavaScript rendering affects Google indexing: https://sitebulb.com/resources/guides/how-javascript-rendering-affects-google-indexing/

[4] Schema.org — Article type: https://schema.org/Article

The Bot‑Visible Test (BVT): the 5‑minute audit that predicts whether LLMs can cite you