What Sources Does ChatGPT Cite? Data From 184,212 Citations

by

·

What sources does ChatGPT cite when it answers a buying question — and are they the same ones Perplexity and Gemini lean on? No. Each platform pulls from a measurably different mix of editorial sites, vendor docs, Reddit threads, review platforms and reference pages. Optimize for the wrong mix and you stay invisible on the platform your buyers actually use.

(Looking for how to cite ChatGPT in an academic paper? Different question — this page is a data study of the citations AI answers themselves contain.) We analyzed 184,212 citations from 40,950 answer snapshots collected by MaxAEO's citation tracing between March 1 and May 31, 2026, and broke down exactly which sources ChatGPT, Perplexity and Gemini cite — by domain, by source type, and by query intent.

The short answer: what ChatGPT cites most

ChatGPT cites editorial and news sites most — 24% of citations in our dataset — followed by vendor-owned docs and blogs (21%), reference sites like Wikipedia (13%), review platforms (12%) and community forums such as Reddit (11%). No single domain dominates: the long tail does most of the work.

That last point matters more than any top-10 list. In our data, the ten most-cited domains account for only 14% of all ChatGPT citations. Profound's study of ~730,000 cited ChatGPT conversations (October–December 2025) found nearly the same thing: the top 10 domains captured just 12% of citations, with Wikipedia — the single biggest domain — at only about 5%, even though it appeared in 18% of cited conversations.

The 10 domains ChatGPT cites most

From our 2026 B2B-weighted dataset, ChatGPT's most-cited individual domains:

Rank Domain Share of all ChatGPT citations
1 en.wikipedia.org 4.8%
2 reddit.com 2.1%
3 g2.com 1.6%
4 forbes.com 1.3%
5 youtube.com 1.1%
6 techradar.com 0.8%
7 capterra.com 0.7%
8 gartner.com 0.6%
9 linkedin.com 0.5%
10 medium.com 0.5%

Together: 14% — meaning 86% of ChatGPT's citations go to everyone else. So the practical question is not "how do I get onto the three domains ChatGPT loves." It is "which source types does each platform trust for my category, and am I present on them?" That is what the rest of this study answers.

Chart of what sources ChatGPT cites most compared with Perplexity and Gemini, broken down by source type from MaxAEO citation tracing data

How we measured it: MaxAEO's citation tracing methodology

MaxAEO is an AI visibility tool that runs tracked prompts against ChatGPT, Gemini, Perplexity, Claude, Copilot, Grok and Google's AI surfaces every day, recording which brands get mentioned and which URLs get cited. For this study we isolated one slice:

  • Prompt set: 1,050 prompts across 14 B2B software categories (CRM, support, analytics, security, HR tech and others), mixing informational ("what is…", "how does…") and commercial ("best…", "X vs Y", "alternatives to…") intent.
  • Platforms: ChatGPT (search enabled), Perplexity, and Gemini (with grounding), each prompt run weekly.
  • Window: March 1 – May 31, 2026 — 40,950 answer snapshots, 184,212 extracted citations.
  • Classification: every cited URL bucketed into one of eight source types by domain rules plus manual review of ambiguous domains.

One honest caveat: this is a B2B-software-weighted prompt set. Consumer queries about health, travel or news produce a different mix — Profound's consumer-population numbers above are the right reference point there. Percentages below are shares of all citations on each platform, not shares of answers.

ChatGPT vs. Perplexity vs. Gemini: the source-type mix

The same question gets sourced three different ways. The full breakdown from our 184,212 traced citations:

Source type ChatGPT Perplexity Gemini
Editorial & news (Forbes, TechCrunch, trade press) 24% 19% 22%
Vendor-owned (docs, product pages, company blogs) 21% 17% 26%
Community & forums (Reddit, Quora, Stack Overflow) 11% 21% 9%
Review & comparison sites (G2, Capterra, Gartner) 12% 14% 8%
Reference (Wikipedia and other encyclopedic sites) 13% 7% 6%
Social & video (LinkedIn, YouTube) 8% 6% 17%
Academic & government (.edu, .gov, research) 5% 11% 5%
Other / long tail 6% 5% 7%

Three headline patterns:

  1. ChatGPT is an editorial-and-reference engine. It over-indexes on journalism and Wikipedia relative to the other two.
  2. Perplexity is the UGC engine. Community content takes roughly one citation in five — Reddit alone is its single biggest domain, consistent with Semrush's three-month, 100-million-citation study, which found Reddit among the top sources on every platform but most concentrated on Perplexity.
  3. Gemini trusts you to describe yourself. Vendor-owned content plus YouTube (a Google property) make up 43% of its citations — the highest owned-plus-Google share of any platform.

Citation slots differ too. Median distinct domains per answer: Perplexity 5, ChatGPT 4, Gemini 3. Fewer slots means each Gemini citation is harder to win.

What ChatGPT cites: editorial first, Wikipedia still over-weighted

ChatGPT's mix rewards earned media. Editorial and news (24%) plus reference pages (13%) means more than a third of its sourcing is content you don't control — you earn your way in. On our commercial-intent prompts ("best X for Y"), an editorial roundup or trade-press comparison appeared in the citation set of 71% of ChatGPT answers.

Part of this tilt is contractual. OpenAI has signed content-licensing deals with the Associated Press, Axel Springer, the Financial Times, News Corp, Condé Nast, Hearst and Reddit, among others — licensed publishers flow into retrieval with cleaner access than the open web. News content punches above its weight partly because OpenAI pays for it.

Two behaviors change what gets cited:

  • Citations cluster early in conversations. Profound found a first-turn message is about 2.5× more likely to trigger web search (and therefore citations) than a tenth-turn message. Your brand's first impression is decided in turn one.
  • The mix is volatile. Semrush documented a sharp drop in Reddit and Wikipedia citation rates on ChatGPT after a retrieval-pipeline change in mid-September 2025, with professional sources like Forbes, Medium and LinkedIn gaining share within weeks. Whatever mix you measured last quarter is not this quarter's mix.

The lever this points to: if ChatGPT is your priority platform, digital PR aimed at the publications AI already trusts buys more visibility per dollar than another on-site blog post.

What Perplexity cites: the Reddit and community engine

Perplexity is the platform where user-generated content decides who gets recommended. Community and forum citations hit 21% in our dataset — nearly double ChatGPT's share — and academic/government sources (11%) run second-highest, reflecting Perplexity's own index, which ranks discussion threads and primary sources aggressively.

The commercial-intent numbers are starker. On "best/vs/alternatives" prompts, review sites plus community threads together supplied 41% of Perplexity's citations. A skeptical Reddit thread from 2024 can outrank your polished comparison page in Perplexity's sourcing for years.

A worked example from May 14, 2026 — the prompt "best ticketing system for B2B SaaS support teams", run the same hour on all three platforms:

  • Perplexity cited two Reddit threads, G2, Capterra and one vendor blog.
  • ChatGPT cited a Forbes Advisor roundup, G2, a vendor doc and one Reddit thread.
  • Gemini cited two vendor comparison pages, a YouTube review and one trade-press article.

Same question, three different juries. If AI-driven recommendations matter to your pipeline, Reddit is upstream infrastructure for AI answers — not optional.

Side-by-side comparison of ChatGPT, Perplexity and Gemini answers citing different source types for the same B2B software prompt

What Gemini cites: Google's ecosystem and your own site

Gemini gives the highest weight to content brands control: vendor docs, product pages and company blogs make up 26% of its citations, and YouTube pushes the social-and-video bucket to 17%. Both numbers lead the three platforms, while review sites (8%) and Wikipedia (6%) trail.

The most actionable stat in our dataset sits here. On brand-direct prompts ("what is [brand]", "[brand] pricing", "is [brand] secure"), Gemini cited the brand's own domain in 61% of answers — versus 38% for ChatGPT. If your documentation is thin, outdated or blocked from crawling, Gemini fills the gap with whatever third-party content exists, accurate or not. That makes documentation quality an AI reputation management issue, not just a support issue.

Gemini's behavior also correlates strongly with classic Google standing: pages ranking in the top organic results get cited far more often — the same pattern we document for Google AI Overviews inclusion. Semrush adds a related signal from Google's AI Mode: LinkedIn appeared in roughly 15% of responses there, while Wikipedia fell to about 2%. Google's surfaces simply don't need an encyclopedia layer the way ChatGPT does.

Why the same question gets different sources

The differences are architectural, not random. Each platform retrieves from a different index with different ranking incentives:

  • ChatGPT retrieves through Bing's index plus OpenAI's own crawler (OAI-SearchBot) and the licensing layer above — which is why news and reference content punch above their weight.
  • Perplexity built its own retrieval index and explicitly boosts discussions and primary sources — which is why Reddit and .gov content surface so often.
  • Gemini grounds against Google Search, inheriting Google's ranking judgments and its preference for YouTube and authoritative first-party pages.

One distinction worth keeping straight: training data is not citations. What a model "knows" from pre-training shapes unsourced answers; citations only appear when the platform retrieves live web pages at answer time. You can influence retrieval this quarter — training data, only over years.

The consequence for marketers: "AI visibility" is not one channel. It is three or more retrieval systems with different source diets. A strategy that only feeds one diet — say, blog posts on your own domain — can win Gemini while staying invisible on ChatGPT and Perplexity. Answer engine optimization starts with knowing which diet your target platform eats.

Why public studies disagree (and how to read them)

If you've seen claims that "Wikipedia is 47% of ChatGPT citations" and also "Wikipedia is 5%," neither is necessarily wrong — they measure different things. Before taking any number into a budget meeting, check three things:

  1. The metric. Share of all citations (our method, and Profound's) produces small-looking numbers because of the long tail. Share of responses citing a domain at least once (Semrush's weekly view) produces big numbers — Reddit touched ~60% of ChatGPT responses in early August 2025 by that measure. Same reality, different denominator.
  2. The prompt set. Consumer health prompts pull .gov and medical sources; B2B software prompts pull G2 and trade press. A study's mix reflects its questions.
  3. The date. The September 2025 retrieval shift moved domain-level numbers by tens of points within weeks. Citation data has a shelf life of about a quarter.

This is why one-off screenshots are weak evidence for LLM brand tracking: you need the same prompts, re-run on a schedule, with the metric definition held constant.

What this means for your AI visibility strategy

Map effort to each platform's actual source diet instead of spreading evenly:

  1. Audit where you stand today. Run your 20 highest-intent category prompts on ChatGPT, Perplexity and Gemini. Log every cited URL and bucket it by the taxonomy above. The gaps tell you which source type to fix first.
  2. For ChatGPT: earn editorial citations. Pitch data studies and expert commentary to the trade publications already appearing in your category's answers. Reference-style content — clear definitions, comparison tables — also travels well.
  3. For Perplexity: invest in community presence. Founder-level participation in relevant subreddits, honest answers on Stack Overflow and Quora, and review velocity on G2/Capterra move citation share faster than anything on your own domain.
  4. For Gemini: harden your owned content. Complete, crawlable docs, a maintained pricing page, structured data and YouTube product content cover 43% of its citation diet.
  5. Re-measure monthly and report AI share of voice, not anecdotes. Citation mixes shift; September 2025 proved they can shift fast.

Generative engine optimization, done this way, stops being a hype line and becomes an allocation decision: which source types, in which order, for which platform.

How to track which sources AI cites for your brand

The manual version: fixed prompt list, weekly runs, paste every citation into a spreadsheet, classify by type. It works — for about two weeks, one platform and one product category. Past that, daily sampling across six platforms is what reveals the volatility that actually changes decisions.

That continuous version is what MaxAEO automates: it monitors brand mentions in ChatGPT, Gemini, Perplexity, Claude, Copilot, Grok and Google's AI surfaces daily, traces every citation behind those answers, and tells you which specific source gap — a missing G2 profile, a stale Reddit thread, thin docs — is suppressing your recommendations on each platform. The fix list, not just the dashboard, is the point: that's how you get recommended by ChatGPT more often instead of just watching a score move.

Frequently asked questions

Does ChatGPT cite sources in every answer?

No. ChatGPT only cites sources when it decides to search the web (or when search is forced). Profound's data shows search triggers most often on a conversation's first message — about 2.5× more likely than by turn ten. Pure model-memory answers carry no citations at all, which is why brand perception there depends on training data, not links.

How do I see which sources ChatGPT used in an answer?

Search-enabled answers show inline link icons after the sentences they support, plus a sources panel listing every cited page at the end of the answer. Click any chip to open the underlying URL. If an answer shows no links, ChatGPT answered from model memory and used no retrievable sources.

Are the sources ChatGPT cites always real?

Citations in search-enabled answers are real, clickable URLs retrieved at answer time. The notorious fabricated references come from non-browsing mode, when users ask the model to "list sources" and it generates plausible-looking but sometimes nonexistent ones. For brand monitoring, only retrieval-backed citations are worth tracking.

Which platform is easiest to get cited on?

Usually Gemini for your own branded queries (61% own-domain citation rate in our data, if your docs are solid) and Perplexity for category queries, because community content and fresh primary sources can enter its index within days. ChatGPT's editorial-heavy mix takes longest — earned media has lead time.

How often does the AI citation mix change?

Treat any snapshot as valid for roughly one quarter. Semrush's 13-week study captured Reddit's share of ChatGPT responses falling from about 60% to about 10% within weeks of a single retrieval-pipeline change in September 2025. Continuous AI search monitoring exists precisely because these shifts arrive unannounced.

Does llms.txt help me get cited?

The evidence so far is thin — major platforms haven't confirmed honoring it, and we see no citation lift attributable to it in our tracking. Basics matter more: don't block OAI-SearchBot, PerplexityBot or Google-Extended in robots.txt, keep docs crawlable, and build presence on the source types each platform already trusts.


This article was created with AI assistance from original MaxAEO tracking data and reviewed by a human editor.


Written by

Founder of MaxAEO. Helping brands get found in AI search across ChatGPT, Perplexity, Google AI Overviews, and more.

Run a free AI visibility audit →