Insight Report

Cited by ChatGPT: What 7,000+ Queries Taught Us About Earning LLM Mentions

Masab Gadit7 min readJuly 15, 2025

Generative AI like ChatGPT isn't just answering questions; it's steering buying decisions. Today, 58% of consumers rely on AI for product and service recommendations (up from 25% in 2023), and AI-driven search referrals surged 1,300% last holiday season [hbr.org]. This shift is reflected in data from 3,000+ marketers using our AI SEO agent, KIVA, which tracks where ChatGPT sources and cites content across synthetic query workflows.

At Wellows, we dug into real-world data from thousands of KIVA users, analyzing 7,785 ChatGPT queries and 485,000+ citations across 38,000+ domains. The verdict? AI now plays a significant role in product discovery, with consumer reliance on recommendations increasing by 132% year-over-year.

But it's not just about getting mentioned. We set out to decode which sites large language models (LLMs) trust and cite, and why. What did we find? The rules have changed, and what works might surprise you.

In this post, I'll unpack those insights and what they mean for content creators and marketers trying to stay ahead of the curve.

485,000+

Total Citations Analyzed

38,000+

Unique Domains Referenced

7,785

Queries Evaluated

73%

Commercial Intent Queries

1,300%

AI Search Surge (2024)

TL;DR - Research Highlights

Citation Patterns

Top 50 domains get 48% of citations, but 52% goes to long-tail sites - niche expertise can compete with authority

Query Intent Impact

Commercial queries favor product sites (20%) + tech media (22%) - content strategy must align with user intent

Freshness Factor

Recent content wins for time-sensitive queries, authority wins for evergreen topics - balance is key

Winning Strategy

Be authoritative + structured + intent-aligned = higher citation chances in AI responses

What We'll Explore

1How we Collected and Analyzed the Data
2Domain Type Analysis - Which types of sites get cited the most
3Temporal Trends - Does freshness matter in citation selection
4Search Intent Mapping - How does query type shape citation patterns
5What we learned at Wellows from this dataset
6Research conclusions & future Implications

Let's dive into the data and see what it reveals about aligning your content strategy with the era of generative search.

How We Collected Data?

This report is based on 7,785 anonymized ChatGPT queries captured through the KIVA platform in 2024, generating over 485,000 citations across 38,000+ unique domains. All data was gathered in compliance with privacy standards and reflects enterprise-grade usage across a wide range of content creation intents.

Data Collection

We gathered anonymized logs from 3,000+ marketers. KIVA generated synthetic queries for keywords, simulating searches and tracking which sources ChatGPT cited to reveal AI-native citation behavior.

Processing Framework

KIVA classified synthetic queries by intent, domain type, and timing. This structure traced how LLMs respond to machine-generated prompts, linking each citation to context for deeper behavioral insight.

Statistical Analysis

Cited domains were tagged and grouped by type, time, and intent. This analysis showed which content LLMs prefer, helping marketers boost visibility and increase chances of being cited by ChatGPT.

Domain Type Analysis: Who's Getting Cited?

One of the first analysis we did was to categorize the top-cited domains by their industry or content archetype. In other words, what kinds of sites does ChatGPT tend to pull answers from?

The results showed a clear divide between tech media publishers and product/service websites, with a few educational and consulting sources also included.

To start, here are the 50 most-cited domains in our dataset:

Top Cited Domains in ChatGPT

TechRadar leads with around 14,500 citations, followed closely by trusted tech publishers like CNET, PCMag, Tom's Guide, and TechCrunch. These sites frequently appeared in queries such as top AI optimization tools or best tools for productivity, suggesting that ChatGPT leans heavily on authoritative review platforms for curated content.

What stands out even more is the high visibility of official product websites, such as OpenAI and HubSpot. These domains often show up when ChatGPT references tool features or offers product comparisons.

The trend is clear: tech media and product vendor pages dominate citation rankings, highlighting the SEO value of third-party credibility and well-structured, informative product content.

It's important to note that beyond the leaders, citation frequency drops off in a long tail. The top 50 domains accounted for almost 48% of all citations, while the Long tail (Other domains) accounted for 52%.

Our data included references to over 38,000 unique domains, indicating that ChatGPT drew information from an extensive range of sources. This "fat head, long tail" distribution is visualized below:

This indicates that, although a handful of sites are frequently cited, ChatGPT also draws answers from a broad, long-tail of niche sources for information. In practical terms, authority sites command a significant chunk of citations, but there's plenty of opportunity in the tail. Many smaller blogs, company sites, and forums each contributed a few citations here and there.

It suggests that if your content is highly relevant to a specific question, ChatGPT may find and cite it even if your site isn't a household name, though building authority helps (more on that later).

What kinds of sites are these exactly?

We manually categorized a sample of the top 50 domains into a few archetypes and estimated their share of total citations.

Here's a breakdown:

Domain Category	Examples (Top Cited Sites)	Approx. Citation Share
Tech Media & Publishers	TechRadar, CNET, PCMag, Forbes, TomsGuide, TechCrunch, Wired, TheVerge, Technology Review, Technadu, VentureBeat, CIO, ZDNet	~46.5%
Product & SaaS Websites	OpenAI, HubSpot, Jasper.ai, Copy.ai, Semrush, Salesforce, Writesonic, Grammarly, Adobe, Moz, Microsoft, Ahrefs, Hootsuite, Neil Patel, AI.Google, Netflix, Canva, Buffer	~25.3%
Others	Comparitech, VPNPro, EFF, Norton, McAfee, Search Engine Journal, TorrentFreak, EdSurge, CyberNews, Mayo Clinic, Towards Data Science, etc.	~16.8%
Education & Research	Harvard Business Review, Brookings.edu, Business Insider	~6.6%
Consulting & Analysts	McKinsey, Gartner, Deloitte, IBM	~4.8%

Classification of cited domains by type.

"Tech Media" includes news sites, tech magazines, and specialized review blogs.
"Product / SaaS" refers to official websites or docs of tools and companies.
"Education / Research" covers academic or think-tank content.
"Consulting" includes big analyst firms.
"Others" includes sites that are not easily categorized.

Citation Trends: Where ChatGPT Finds Its Sources in 2025

1. Tech Media & Publishers

Tech publishers and review outlets are highly influential on LLM queries. Established sites like TechRadar and CNET have the kind of authoritative, comprehensive articles that ChatGPT loves to cite. These sites often update their content regularly and cover "top X" lists that align well with user queries, making them prime targets for citations.

2. Product & SaaS Websites

Official product pages for leading tools get cited frequently, more than we initially expected. In queries asking for the "best" products, ChatGPT would not only list those products but often cite the product's website to back up claims about features or pricing.

For example, in answers about VPN services, ChatGPT might say "NordVPN offers fast speeds and strong encryption" and cite NordVPN's site as evidence. This underscores that having clear, factual information on your official site can directly earn you citations in AI-generated answers.

3. Education & Research Sources

Educational and research sources (like HBR.org, Brookings, or arXiv papers) constituted a smaller slice of total citations (~9%), but were prominent for certain types of questions (primarily conceptual or trend-focused queries).

For instance, Harvard Business Review articles appeared often in answers to questions about AI's impact on industry or strategy. These sources tend to provide deep insight or data, indicating that ChatGPT turns to them for authoritative context.

4. Consulting & Analyst Sites

Consulting firms and industry analysts (McKinsey, Gartner, etc.) were only rarely cited (~1%). This was a bit surprising, given their thought leadership reports.

It's possible that their content is gated or not as frequently surfaced by web search, or that ChatGPT had sufficient information from other sources. In any case, traditional consulting whitepapers didn't significantly influence ChatGPT's answers in our sample.

5. Long Tail: Niche and Community Sources

The long tail of "other" domains (over 50% of citations) includes everything from niche blogs to community forums and Q&A sites. This diversity means that almost any quality content, even from lesser-known sites, can potentially be pulled in by ChatGPT if it directly addresses the user's query.

We saw citations from government sites (e.g., NIST.gov for cybersecurity standards), developer documentation (IBM's knowledge base for programming tool queries), and even Reddit threads for specific troubleshooting questions.

The caveat: each of these individual sites contributed only a few citations. Together, they form the fabric of the AI's broad knowledge base.

In short, dominant citation sources fall into two camps authoritative publishers and trusted product/service sites, with a healthy mix of others filling specific niches.

For brands, this highlights two paths to being cited: become a go-to publisher of information in your domain, or build your product's site into a trusted source of facts and specs. Ideally, do both.

Temporal Trends: Fresh vs. Evergreen Content

Does ChatGPT prefer fresh, recently updated content, or does it lean on evergreen information? This is a crucial question for content strategy, since one might assume an AI with up-to-date browsing would always fetch the latest articles.

Our analysis suggests a nuanced answer: ChatGPT's citations tend to reflect content freshness when it's relevant, but not at the expense of authority.

Key Findings from Query Analysis

We found that when queries included terms like "latest" or specific years ("2023," "2024"), ChatGPT consistently cited recent sources.
Pages titled "Best X of 2025" from sites like TechCrunch or TechRadar were frequently selected, showing a clear preference for fresh content when the prompt implied recency.
In our dataset of 600+ such queries, the majority of citations matched the year mentioned.
For queries without a time anchor, ChatGPT often preferred authoritative, evergreen sources. A detailed guide on "how GPT-4 works" from 2023 was cited over newer but less in-depth posts.
Academic research and technical documentation, regardless of publication date, also remained common citations for conceptual or factual questions.
Citation behavior also varied by domain type, news and review sites, which update frequently, tended to appear more in time-sensitive responses.
Vendor documentation or research articles, rarely updated, were cited when their content remained factually accurate or uniquely insightful.

What this means for content strategy:

Keep your content updated when appropriate. If you operate a site that publishes rankings, how-tos, or guides, updating them with current information can increase the chances that an LLM will pick your page as a source for current queries.

That said, don't update just for the sake of a new timestamp update to maintain relevance and accuracy.

ChatGPT's behavior suggests it values a mix of up-to-date and authoritative information. The best scenario is having content that is both definitive and regularly updated with new data, for example, a comprehensive guide in your field that stays current over time.

Search Intent Mapping: How Query Intent Affects Citations

Not all queries are created equally. We categorized the user queries into a few intent types - broadly Informational, Commercial Investigation, and Transactional, to see how the citation patterns differed. Here's how we defined them:

Informational: Explains concepts or definitions (e.g., "What is continuous integration?")
Commercial: Compares or recommends tools (e.g., "Best VPN for gaming")
Transactional: Action-based, like pricing or downloads (e.g., "How to download PyTorch")

When we mapped queries to intent types, ~73% were Commercial, ~23% Informational, and ~4% Transactional. The skew toward "best/tools" queries reflects the dataset's focus on tool-related topics. Still, citation patterns varied notably across these categories.

Commercial Intent

Comparison + product discovery.

ChatGPT often cites:

Review media (e.g., PCMag)
Official product sites (e.g., Asana) preferred

Breakdown:

44% Product/Vendor Sites
40% Tech Media
7% Research/Educational
9% Other/Consulting

Informational Intent

Educational + explanatory queries.

ChatGPT favors:

University/research sites
Analytical sources (e.g., HBR) used rarely

Breakdown:

50% Research/Educational
28% Tech Media
16% Product/Vendor Sites
6% Other/Consulting

Transactional Intent

Action-oriented queries.

ChatGPT prefers:

Official pricing/FAQ pages
Gov/org domains (e.g., mayoclinic.org)

Breakdown:

56% Research/Official (.org/.gov)
24% Tech Media
11% Product/Vendor Sites
9% Other/Consulting

Content Strategy Implications

For Commercial Queries:

For product/commercial queries, it's all about products and expert reviews, essentially mimicking what a human might do (check a few top 10 lists and the product specs)

Breakdown:

Mostly Tech Media (~40%) and Product sites (~44%)
Very little research cited (~green)

For Informational Queries:

For informational queries, the AI relies on knowledge-rich and authoritative content, much like an open-book scholar citing textbooks or research papers to explain a concept

Breakdown:

Mostly Research/Education (~50%)
Some Tech Media (~28%) and Product (~16%)

For Transactional Queries:

For transactional queries, official and trusted info (whether a company's site or an official body) is key, similar to how a user might go straight to the source for a price or how-to

Breakdown:

(56%) research/edu, (11%) product pages
Actionable, efficient content favored

Behind the Data: Lessons from the Wellows Team

As a founder, analyzing these citation patterns completely changed my perspective. Talking about AI's impact on search is one thing, but seeing how an LLM selects its sources is eye-opening.

Here are some honest insights and lessons we at Wellows learned firsthand:

Being the "Best" is a Double Win

We noticed how often ChatGPT cited official product sites for top-ranked tools. This drove home a point, if you manage to be one of the top recommendations in your category, the AI will likely mention you even use your site as evidence.

It's like winning two prizes: first, the human-crafted "best of" lists include you, then AI amplifies your presence by citing your own domain. For us, this emphasized the importance of product excellence and reputation. No amount of SEO tricks will get you cited by ChatGPT as "one of the best" if you truly aren't regarded as such by the source material it trusts.

In an age of generative search, quality and brand reputation are more critical than ever - there's no gaming the system when an AI is synthesizing consensus opinion.

Authority is Earned, Not Owned

Initially, I thought big-name domains would completely dominate citations. While high-authority sites did lead, the long tail showed that authority in AI eyes is very context-specific. If someone asks about machine learning frameworks, an obscure blog by a ML engineer might get cited because it has the precise answer, even if that blog would never outrank TensorFlow's official site on Google.

This taught us not to discount niche expertise. At Wellows, we encourage deep-dive content on specific subtopics even if it targets a niche. If it's the best answer on that topic, ChatGPT might surface it. In short, be the Wikipedia of your niche.

The New SEO is Two-Pronged

We now do "SEO and GEO" - Search Engine Optimization and Generative Engine Optimization. Traditional SEO isn't going away, but we now also evaluate how LLMs consume our content.

Does it have concise facts and quotable linesthat an AI could pull as a snippet?
Is it structured with clear headings and lists for easier AI parsing? (lseo.com)

Are we using semantically rich language that mirrors the kinds of queries users type? Our LLM Pattern Analysis Checklist breaks down these patterns into actionable steps, ensuring content is both SEO-friendly and optimized for AI-driven discovery.

Practical Changes We Made

We began adding FAQ sections and summary bullets to our articles, anticipating that AI might favor well-formatted snippets. We also started paying more attention to schema markup and metadata to boost contextual clarity even though LLMs don't directly use schema like search engines do, it still helps signal intent.

Trust Signals Matter Even More

Some domains-like HBR or Mayo Clinic-consistently popped up in ChatGPT for critical topics. That's because ChatGPT appears to favor sources that ooze credibility, reflecting an internalized sense of trust and authority.

This reinforced our strategy to become trusted resource in our domain. That means:

Focusing on quality over quantity
Publishing under real experts' names
Providing original research and transparent sourcing

It's all about E-E-A-T - Experience, Expertise, Authoritativeness, and Trustworthiness - principles that now influence both SEO and LLM relevance [lseo.com].

Throughout this journey, a humbling realization is that we're all learning how these AI systems work. The data gave us clues, but the algorithms are complex and evolving. We have to remain adaptive.

One week, a certain site might be the source for ChatGPT, and a month later, a new, better source could replace it as the model's knowledge updates.

So our mindset at Wellows is to stay curious, keep testing, and never assume we've "figured it out" completely. The goal is to continuously align our strategies with the AI's behavior, which, in a way, keeps us honest about making genuinely useful content.

Research Conclusions & Future Implications

Key Research Findings

48% of citations come from top authority domains
66% of commercial queries cite product specs or expert reviews
52% of citations go to long-tail domains - niche is powerful
Recency boosts citations for temporal queries; evergreen content retains long-term value

Business Impact Projections

58% of consumers now use AI for product research (132% YoY growth)
1,300% increase in AI referrals during the 2024 holiday season
LLM citation optimization offers a first-mover advantage in AI-driven discovery
GEO strategies enhance SEO impact without cannibalization

About Wellows

Wellows is envisioned as an AI-first marketing platform that acts as an autonomous marketer for your brand. It deploys a "Visibility Stack" of intelligent agents to uncover the exact questions your audience is asking across Google and leading LLMs like ChatGPT, Gemini, Claude, and DeepSeek.

Wellows identifies visibility gaps, prioritizes high-impact content opportunities, generates citation-worthy content, and structures it for maximum discoverability. In addition, it publishes with precision, continuously monitors every mention, flags drops, and automatically updates or prunes assets so your growth engine remains self-learning, always-on, and headcount-free.

What I Think

To earn AI citations, be the best answer-publish the most useful, accurate, and relevant content. Algorithms reward quality, authority, and relevance. Focus on standout value and clarity.

Early findings from our ChatGPT analysis indicate that brands with strong content tend to receive more citations. Keep adapting as AI evolves.

Bottom line: Experiment, analyze, and keep improving. The field is new, so there's room to lead.

Cited by ChatGPT: What 7,000+ Queries Taught Us About Earning LLM Mentions

TL;DR - Research Highlights

What We'll Explore

How We Collected Data?

Data Collection

Processing Framework

Statistical Analysis

Domain Type Analysis: Who's Getting Cited?

Top Cited Domains in ChatGPT

What kinds of sites are these exactly?

Citation Trends: Where ChatGPT Finds Its Sources in 2025

Temporal Trends: Fresh vs. Evergreen Content

Key Findings from Query Analysis

What this means for content strategy:

Search Intent Mapping: How Query Intent Affects Citations

Commercial Intent

Informational Intent

Transactional Intent

Content Strategy Implications

Behind the Data: Lessons from the Wellows Team

Being the "Best" is a Double Win

Authority is Earned, Not Owned

The New SEO is Two-Pronged

Practical Changes We Made

Trust Signals Matter Even More

Research Conclusions & Future Implications

Key Research Findings

Business Impact Projections

About Wellows

What I Think

Strengthen Your Brand's AI Search Visibility

Company

Solutions

Social