Systems

Why LLMs Cite Some Marketing Sites and Ignore Others

The 2026 playbook for getting recommended by ChatGPT, Perplexity, Claude, and Google AI Overviews. Real signals, real numbers, from the inside.

By Brad Langan · June 7, 2026 · 7 min read

LLMs cite sites that read like answers and verify like authority. Most marketing sites do neither. Here's what actually moves the needle in 2026.

Generative engine optimization is not classical SEO. The signals overlap, but the ranking function is different. A site that ranks #3 on Google can be invisible to ChatGPT, and a site that barely ranks on Google can dominate Perplexity. This is what we learned wiring it for ourselves and our clients.

What is generative engine optimization?

Generative engine optimization, or GEO, is the practice of structuring content, schema, and authority signals so that large language models cite your site as a source in their answers. It overlaps with SEO on indexability and authority, but adds a layer of answer-shape, fact density, and third-party consensus that classical SEO ignores. Sites that do GEO well get cited by ChatGPT, Perplexity, Claude, Google AI Overviews, and Microsoft Copilot.

Why this matters now

AI answer engines are eating organic search. Gartner forecasts a 25% drop in traditional search engine volume by 2026, with users shifting to AI assistants for the questions that used to feed your top-of-funnel. If a buyer asks ChatGPT 'who builds first-party attribution for subscription businesses,' the brand cited in the answer wins the lead. The brand ranking #1 on Google for the same query may never see the click.

There are five engines that matter in 2026: ChatGPT, Perplexity, Claude, Google AI Overviews, and Microsoft Copilot. Each one indexes the open web slightly differently, but the citation signals overlap enough that one playbook works for all five.

The signals LLMs actually weigh

Across the five engines, the citation function rewards a small set of signals. We've watched it in our own logs and in the AI-traffic data Similarweb publishes.

Answer-shape content. A 40-to-60-word direct answer at the top of every section, before any context.
Fact density. Specific numbers, dated claims, named sources beat vague generalities.
Schema. Article, FAQPage, HowTo, and Person markup that LLMs can parse without ambiguity.
Third-party consensus. Reddit threads, industry publications, and Wikipedia entries that name you.
Author authority. A named author with a credential trail and a public bio.
Crawlability. Server-rendered HTML. AI crawlers do not execute JavaScript.
Freshness. Updated data, not just updated dates.

Reddit is half the answer

Reddit accounts for roughly 50% of Perplexity citations and 23% of ChatGPT and Google AI Overview citations, according to citation tracking from Profound and Semrush. AI is about 6.5x more likely to cite Reddit than your own marketing site for the same question. If your brand is not part of the conversation on r/marketing, r/SaaS, r/Entrepreneur, r/PPC, or the subreddits your buyers actually live in, you are invisible to half the AI answers about your category.

Answer capsules beat long intros

The single biggest on-page change that lifts citation rate is rewriting every H2 to lead with a 40-to-60-word direct answer. LLMs extract these capsules and quote them. A page that buries its answer under 300 words of brand voice gets read past by both humans and machines.

Studies from Princeton's GEO research group found that adding direct answer capsules under each header lifts AI citation rate by 30 to 40% on the same content. Adding FAQ schema on top adds another 2.7x lift on long-tail queries. Combined, these two changes move pages from 'never cited' to 'cited weekly' in our tracking.

Fact density is the new keyword density

Fact density is the ratio of specific, verifiable claims to total words. Pages with high fact density (named studies, real numbers, dated events, attributed quotes) get cited 40% more often than pages making the same argument in vague language. 'Most teams struggle with attribution' loses. '78% of marketing teams cite attribution as their top measurement gap, per the 2025 Demand Gen Report' wins.

Schema that LLMs actually use

Four schema types do most of the work. Article schema on every editorial page. FAQPage schema for any Q&A block. HowTo schema for procedural content. Person schema for the author, linked from the article. Organization schema sitewide for brand identity. Skip the rest. Recipe, Movie, and Product schema only matter for those verticals.

Schema	Use it for	Estimated citation lift
Article	Editorial pages, insights, posts	15 to 20%
FAQPage	Q&A blocks at end of page	2.7x on long-tail
HowTo	Step-by-step procedures	20 to 30%
Person	Author bio, credentials	10 to 15%
Organization	Sitewide brand identity	Baseline requirement

JavaScript is the silent killer

GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript. If your site is a single-page React or Vue app that renders content client-side, AI crawlers see an empty body. They cannot cite what they cannot read. Server-side rendering, static site generation, or prerendering are non-negotiable for GEO.

We solved this on Moonshot by adding a prerender step at build time. Every page renders to static HTML with a hidden-but-crawlable body containing the full page content, JSON-LD, OG tags, and FAQ markup. The SPA hydrates on top for human users. Crawlers see the full content from the first byte.

llms.txt and llms-full.txt

llms.txt is a proposed standard for telling LLMs how your site is organized. Google has publicly said it does not use it. Anthropic and OpenAI have not committed either way, but they do crawl it. It is cheap to ship: a single markdown file at /llms.txt with your structure, plus an optional /llms-full.txt with the full content of your key pages. The downside is zero. The upside is non-zero.

Author authority and E-E-A-T

Google's quality raters look for Experience, Expertise, Authoritativeness, and Trustworthiness. LLMs weigh similar signals when deciding which sources to cite. A page written by 'admin' or 'the team' is weaker than a page written by a named human with a credential trail. We add author bylines, link them to a bio page with Person schema, and surface the author's real credentials: companies founded, dollars managed, certifications held.

96% of citations in Google AI Overviews come from sources Google's quality systems classify as authoritative, per a 2025 SEMrush analysis. If you are not on that list, you are not getting cited.

IndexNow and Bing

Bing powers ChatGPT search. So Bing's index matters more than its market share suggests. IndexNow is a free protocol that lets you ping Bing the second you publish, instead of waiting for them to crawl you. Generate an API key, drop the key file in your public folder, and POST your URLs on every deploy. Google does not support IndexNow, but Bing, Yandex, Naver, and Seznam do, and Bing is the door to ChatGPT search results.

A 30-day GEO sprint

If you are starting from zero, here is the order we recommend.

Week 1: Prerender. Make sure every page renders to static HTML with full body content. Verify with `curl -A "ClaudeBot/1.0" yoursite.com/page`.
Week 2: Rewrite. Every H2 gets a 40-to-60-word answer capsule. Every page ends in a 5-question FAQ with FAQPage schema. Every fact gets a specific number or named source.
Week 3: Authority. Add Article + Person schema. Build out author bios. Ship llms.txt and llms-full.txt. Set up IndexNow.
Week 4: Distribution. Get into the relevant subreddits as a real participant. Pitch one industry publication. Update three existing pages with fresh data (not just dates).

Frequently asked

Is GEO different from SEO?

Yes. SEO optimizes for click-through from a search engine result page. GEO optimizes for citation inside an AI-generated answer. The signals overlap on indexability and authority, but GEO adds answer-shape, fact density, and third-party consensus weighting that SEO ignores.

Do AI crawlers respect robots.txt?

Most do. GPTBot, ClaudeBot, Google-Extended, and PerplexityBot all check robots.txt. You can allow or block each crawler independently. We recommend allowing all of them for editorial content.

Does llms.txt actually work?

It is unverified but cheap. Google has said publicly that it does not use it. Anthropic and OpenAI have not committed either way, but they do crawl it. There is no downside to shipping it.

How long until I see citation lift?

In our tracking, FAQ schema and answer capsules show citation lift within two weeks. Reddit presence takes longer, two to three months of genuine participation. Third-party publication mentions are the slowest, often six months from pitch to citation.

What tools track AI citations?

Profound, Siftly, Peec AI, Otterly, and Scrunch all track which AI engines cite your brand on which prompts. Most have free tiers. Pick one and run it weekly.

What we do for our own site

Everything in this article is what we ship on gomoonshot.com. Prerender at build, answer capsules under every H2, FAQ schema on every page, Person schema on Brad, llms.txt and llms-full.txt at the root, IndexNow on every deploy, fresh articles weekly. We track citations across the five engines and we update what works.

If you want this wired into your own site, the same way, talk to us. It is part of every Blueprint.

Book a call