On June 30, 2025, Cloudflare blocked Perplexity AI from accessing websites through its infrastructure. The reason? Scraping, without disclosure, permission, or attribution. Within hours, an internet-wide debate ignited: What should be considered, AI agents or bots? Are they the future of search, or the end of content ownership?

This wasn’t just about Perplexity. It was a shot across the bow for every large language model (LLM) consuming the modern web — ChatGPT, Claude, Gemini, Deepseek, LLaMA, and the rest. These systems don’t just browse pages. They retrieve, synthesize, and respond.

But here’s the question brands, publishers, and SEOs are now asking: Are AI agents considered web users? What does AI as web users mean?

The graph shows AI agents, especially ChatGPT, driving growing referral traffic—reaching over 300M monthly referrals by mid-2025—signaling that users are increasingly shifting from traditional web search to LLMs for information. In other words: How do AI web users impact online traffic?

TL;DR — What You’ll Learn in This Blog:

  • Why Cloudflare blocked Perplexity—and what it means for LLM access
  • How LLMs behave more like users than bots (and why that matters)
  • What this shift means for publishers, SEOs, and content creators
  • How AI-generated answers change attribution, consent, and value exchange
  • The real implications for web traffic, monetization, and data control
  • Why we need a new framework to understand “usage” in the GEO era

Whether you believe it’s fair use, theft, or innovation, the reality is clear: LLMs are reshaping how content is consumed, making generative engine optimization important day by day. They’re not just intermediaries between users and websites. Increasingly, they are the interface. And they’re not clicking through.


How do AI agents differ from bots on the open web?

AI agents and traditional bots on the open web differ in terms of intelligence, adaptability, and independence.

Traditional Bots:

  • Rule-Based Execution: Bots work on fixed scripts and pre-set rules, carrying out tasks like web crawling, scraping data, or sending automated replies. They are reliable for repetitive and structured activities but lack flexibility.
  • Low Adaptability: Bots cannot adjust to new conditions or learn from experiences. To function in a different context, they require manual updates or reprogramming.

AI Agents:

  • Learning and Contextual Decisions: AI agents use machine learning and natural language processing to interpret context, make informed choices, and get better over time. They are capable of handling complex and evolving tasks.
  • Autonomy and Collaboration: AI agents can operate independently, taking actions without human intervention. They can also coordinate with other agents and systems to accomplish larger goals.
  • Context Awareness: AI agents process contextual details to deliver personalized and human-like responses, making them effective in scenarios that require reasoning rather than rule-following.

In summary, while bots are suited for simple, repeatable tasks in fixed frameworks, AI agents stand out for their ability to learn, adapt, and act autonomously, enabling them to manage complex tasks and interact more naturally with users.


What Does it Mean to be a “User” of the Web Today?

For decades, the definition of a user was simple: a human with a browser. Someone who typed, clicked, scrolled, and triggered analytics. But that definition no longer holds.

Today, LLMs like ChatGPT, Claude, and Perplexity interact with the web interact with the web in ways that mirror traditional users—effectively AI agents as website users. They access content, extract meaning, and generate outputs that shape real human decisions—without ever loading a webpage in a browser.

Feature Human User LLM (e.g., ChatGPT, Perplexity)
Requests and retrieves web content ✔️ via browser ✔️ via APIs or crawlers
Interprets page meaning ✔️ reads and comprehends ✔️ parses and synthesizes
Takes action on content ✔️ clicks, shares, saves ✔️ generates summaries, citations
Leaves analytics trail ✔️ sessions, events ❌ often invisible or masked
Returns for repeated engagement ✔️ revisits, subscribes ✔️ queries regularly at scale
Monetizes via ads, subscriptions ✔️ ad views, purchases ❌ no direct monetization

LLMs don’t just crawl the web—they use it. They extract value, repurpose knowledge, and shape downstream user behavior. In many cases, they’re the first point of contact between content and human readers.


Why Cloudflare Banned Perplexity And What It Reveals About the Future of Web Access?

On June 30, 2025, Cloudflare blocked Perplexity AI from accessing sites on its network. The reason? Perplexity was scraping content at scale—bypassing robots.txt (and emerging controls like llms.txt), avoiding attribution, and routing requests through third-party clouds like AWS and Azure without identifying itself.

Cloudflare called it a violation of publisher rights and web norms, forcing a decision on whether to treat LLM traffic as AI agents as internet users with access rights or as bots subject to stricter controls.

But the response wasn’t universally welcomed. Perplexity’s rebuttal argued that AI agents are extensions of users, not bots acting independently. Blocking them, they claimed, is the equivalent of charging users for accessing public information through new tools. And that’s the heart of the debate.

Front End View of

If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone — creators, consumers, tomorrow’s AI founders, and the future of the web itself- CEO, Cloudflare

While Cloudflare frames the move as protecting publishers, critics see it as a land grab—positioning itself as a toll booth on the open web, deciding who gets access and at what cost.

Crawlers-check-robots-dot-txt-then-WAF-rules-if-blocked-change-agent-or-IP-else-scrape-or-crawl-content


How do LLMs Approach Web Search?

LLMs don’t browse like humans. They retrieve, filter, and synthesize—through tightly controlled systems. Here’s how it works:

Large Language Models (LLMs) enhance web search capabilities by combining real-time information retrieval with their generative abilities. This integration enables them to deliver responses that are more accurate, timely, and contextually relevant.

Retrieval-Augmented Generation (RAG):
One of the most widely used techniques is Retrieval-Augmented Generation (RAG). In this approach, the model retrieves relevant information from external sources, such as live web data, before producing an answer. The process typically involves:

  1. Retrieval: The LLM queries external databases or the internet to gather useful documents or data.
  2. Augmentation: The retrieved content is added to the model’s context.
  3. Generation: The model generates a response that blends both its existing knowledge and the newly acquired information.

This method ensures responses are grounded in the most recent and relevant data, reducing the risk of outdated or inaccurate answers.

Integration with Search APIs:
LLMs also leverage specialized search APIs to streamline real-time data access. These APIs provide large-scale search capabilities designed for LLMs, allowing them to efficiently retrieve, process, and integrate live web information into their outputs.

Challenges and Considerations:
While integrating web search significantly improves LLM performance, it also introduces a set of challenges:

  • Information Overload: Pulling in too much or irrelevant data can overwhelm the model, making answers less clear.
  • Data Quality: The accuracy of results depends on the reliability and credibility of the retrieved sources.
  • Processing Costs: Real-time retrieval can be resource-intensive, impacting both efficiency and scalability.

To overcome these issues, advanced solutions have been developed that filter and prioritize retrieved content, cut through noise, and rank results by relevance. This not only optimizes token usage but also improves the clarity and quality of generated responses.


What are the challenges in developing AI agents capable of autonomous web navigation?

Developing AI agents capable of autonomous web navigation presents several significant challenges:

  1. Dynamic and Complex Web Environments
    Modern websites rely heavily on dynamic content, client-side rendering, and advanced JavaScript frameworks. This makes it difficult for AI agents to consistently interpret and interact with web elements. Non-standard HTML, frequent design changes, and hidden scripts can all disrupt navigation and reduce reliability.
  2. Visual Processing Limitations
    Autonomous agents often struggle with visual reasoning, such as distinguishing key elements on cluttered interfaces or understanding spatial relationships between objects. These limitations can result in errors like selecting the wrong link, overlooking important content, or misinterpreting page layouts.
  3. Data Quality and Bias
    The effectiveness of AI agents depends on the quality of their training data. Biased, incomplete, or unrepresentative datasets can lead to flawed decision-making, discrimination, or limited generalization. This risk becomes particularly concerning in high-stakes areas such as healthcare, finance, or policy.
  4. Security and Privacy Concerns
    Because autonomous web agents interact directly with online systems, they are exposed to cyber threats such as adversarial inputs, phishing attacks, and data leaks. Ensuring strong safeguards, encryption, and privacy protocols is crucial, especially when handling sensitive or personal information.
  5. Ethical and Legal Implications
    AI agents acting independently raise complex questions of accountability and regulation. They may unintentionally breach copyright, violate terms of service, or make ethically questionable decisions. Clear legal guidelines, oversight mechanisms, and governance frameworks are required to manage these risks effectively.
  6. Error Propagation in Multi-Step Tasks
    Many navigation tasks involve multiple sequential actions, such as logging in, searching, and extracting data. Even a small mistake early in the workflow can compound into larger failures, making recovery difficult and reducing task completion success. Designing error-tolerant and self-correcting systems remains a key challenge.

Why This Matters? You’re Not in Control

Here’s the uncomfortable truth: as a user, you don’t control how these systems search the web. You’re relying on:

Why This Matters

  • Index freshness (which you can’t influence)
  • Retrieval scope (how many pages are fetched)
  • Filtering layers (what gets passed to the model)

And each of those layers is optimized for cost, speed, and safety—not completeness.

If ChatGPT limits itself to one webpage per query to reduce token usage, that affects your result. If Perplexity deprioritizes a site that hasn’t been indexed recently, that affects what it cites. And if Gemini decides summaries are better than links, you may never reach the original source at all.

esign pages assuming AI agents as website users could be your first audience.


Are LLMs replacing humans in online discovery?

The web used to be a direct conversation between people and pages. A user searched, scanned results, clicked a link, and explored the site. But today, that first touchpoint is increasingly handled by something else entirely: a language model.

LLMs don’t just help with discovery — they’re starting to replace it.

According to Wall Street Journal, In June, 5.6% of search activity in the United States was directed to chatbots instead of traditional search engines—an increase from 2.48% in June 2024 and 1.3% in January 2024.

That’s not a rounding error. It’s the early signal of a structural shift. Here’s how the discovery chain now often works:

  1. User asks a question in ChatGPT, Perplexity, or Claude
  2. The model fetches sources, synthesizes an answer
  3. The user reads the answer — and never visits the original sites
  4. Credit, context, and traffic are stripped away

Here’s a visible representation of how ChatGPT searches the web:

ChatGPT-uses-control-to-route-questions-through-Bing-search-and-web-visit-steps-then-passes-info-to-GPT-4-for-answers

This means content is still being found, but not by humans. It’s being consumed, parsed, and reused by machines — then reshaped into answers that render visits optional.


How are AI agents transforming web browsing experiences?

AI agents are reshaping web browsing by turning browsers into intelligent assistants that understand intent, automate processes, and deliver personalized interactions. This transformation can be seen through several key developments:

1. Rise of AI-Integrated Browsers

Companies are embedding AI into browsers to improve productivity and user experience:

  • Opera’s Neon Browser: Neon introduces built-in AI that can perform actions directly on webpages. With features like Neon Do, it autonomously navigates and completes tasks while prioritizing privacy and speed.

  • Microsoft Edge’s Copilot Mode: Edge now includes Copilot Mode, which helps users by structuring search flows, comparing information across multiple tabs, and enabling voice-based commands to streamline browsing.

2. Automation of Online Workflows

AI agents are reducing manual effort by managing complex digital tasks:

  • OpenAI’s Operator Tool: Operator interacts with forms, buttons, and menus to accomplish tasks like organizing to-do lists or planning trips, making browsing more action-driven.

  • TinyFish’s AI Agents: TinyFish develops intelligent web agents that simulate human browsing, automating tasks such as price tracking and large-scale data collection for industries like retail and travel.

3. Emergence of the Agentic Web

The internet is evolving into an ecosystem where AI agents act independently and collaboratively:

  • Agentic Web Frameworks: Frameworks like webMCP embed structured metadata into webpages, enabling AI systems to interpret and interact with content more effectively and with less computational cost.

  • SkillWeaver: This system allows agents to improve autonomously by discovering, refining, and reusing skills, making them progressively more capable over time.

4. Personalization and Interaction Upgrades

AI is making browsing experiences more tailored and interactive:

  • Personalized Engagement: By analyzing browsing history, location, and past purchases, AI agents deliver customized content, promotions, and recommendations.

  • Streamlined Support: Agents provide instant help for common queries, track orders, and resolve issues in real time, leading to more efficient customer service.

How AI Agents like like, ChatGPT, Perplexity, and Claude Are Changing Web Browsing and Monetization?

AI agents like ChatGPT, Claude, Gemini, and Perplexity are emerging as the new gateway to online information.

Rather than acting as passive tools, they now function as active intermediaries, retrieving, interpreting, and presenting knowledge directly to users.

This change is reducing the need for traditional human browsing and placing growing pressure on the ad-driven business models that have sustained the open web for decades.

Decline in Traditional Web Traffic

For years, the web economy thrived on clicks, pageviews, and subscriptions. Every visit translated into measurable engagement and monetization opportunities.

But AI agents are breaking this cycle. A user can simply ask a question, receive a summarized response, and move on—without ever landing on the original site.

The data confirms this trend. In the United States, chatbot-driven search activity grew from 1.3% in early 2024 to 5.6% by June 2025.

Meanwhile, ChatGPT and similar systems already generate over 300 million monthly referral interactions.

While this shows AI can still direct some traffic, the majority of interactions stop within the AI interface, leaving publishers invisible.

Pressure on Advertising Revenue

This shift undermines the traditional advertising model. Ads rely on human attention—impressions, clicks, and time spent on site.

But AI agents don’t view banners, skip videos, or trigger analytics events. Even when they use publisher content to build responses, they often fail to attribute the source, stripping away brand visibility.

Advertisers face what experts call an “attention lemons” problem: as more traffic flows through AI intermediaries, the quality of attention becomes uncertain.

This lowers trust in ad metrics, reduces pricing power, and destabilizes the very foundation of open-web monetization.

Emerging Monetization Strategies

In response, publishers and businesses are experimenting with new models better suited for an AI-first internet:

  • API Access & Licensing – Delivering structured data through paid APIs or licenses so AI systems compensate publishers for usage.
  • Generative Engine Optimization (GEO) – Going beyond SEO by structuring content to be machine-readable, improving the odds of being retrieved and cited in AI-generated answers.
  • Direct AI Integration – Embedding services, product catalogs, or datasets inside AI platforms, ensuring visibility even if users never visit the website.
  • Pay-to-Play Access – Restricting or gating AI crawlers unless access agreements are made, a trend highlighted by Cloudflare’s high-profile block of Perplexity in mid-202

Implications for the Future of the Open Web

This transition forces a redefinition of what it means to “use” the web. Historically, a user was a human browsing pages.

Today, an increasing share of that role belongs to AI agents acting on behalf of humans. If this trend accelerates, the ad-supported open web will struggle to survive in its current form.

Large publishers may adapt by striking deals with AI companies, while smaller creators risk invisibility.

The future will likely depend on new economic models—where licensing, API access, and AI partnerships replace traditional clicks and impressions as the engines of monetization.

AI agents are rapidly replacing human browsing as the primary gateway to information.

This weakens traditional ad-based revenue but opens the door to new models like licensing, APIs, and direct integration with AI platforms.

The open web will not disappear, but its economic structure is being rewritten—driven by a world where visibility depends as much on machines retrieving content as on humans clicking links.


How Can Marketers Adapt to LLMs Crawling Web Search?

As large language models (LLMs) like ChatGPT, Perplexity, and Google’s Search Generative Experience reshape the search landscape by surfacing direct answers instead of only listing web links.

Marketers need to shift their strategies to stay relevant and visible. Here are some practical approaches:

  1. Structure Content for Machine Readability

LLMs look for precise, structured information. To boost discoverability:

  • Use Clear Headings and Takeaways: Break content into logical sections with descriptive titles and provide upfront summaries to support AI extraction.
  • Leverage Modular Content: Present ideas in independent blocks so that each section can stand on its own when pulled into AI-generated answers.

This makes your content easier for AI to process and more likely to be included in responses.

  1. Establish Topical Authority

Search models lean toward trusted and authoritative sources. To strengthen your position:

  • Build In-Depth Content Hubs: Cover key industry themes with comprehensive guides and related posts to reinforce your brand’s expertise.
  • Invest in Digital PR: Share original data, insights, or reports that earn citations from credible publications.

A reputation for authority improves your chances of being referenced by AI-driven search systems.

  1. Optimize for Conversational Queries

Because LLMs are designed around natural language, it’s important to align content with how users phrase questions:

  • Provide Direct Answers: Anticipate FAQs in your niche and respond with clear, concise explanations.
  • Adopt a Conversational Tone: Mirror how people ask questions so your content feels natural in AI-powered results.

This alignment increases the likelihood of your material being pulled into AI answers.

  1. Ensure Technical Accessibility

If your site isn’t technically accessible, LLMs may miss your content. To prevent that:

  • Keep Websites Crawlable: Check robots.txt and ensure that essential information isn’t hidden behind scripts or gated content.
  • Add Structured Data: Use schema markup so AI can better understand your content’s meaning and context.

Strong technical foundations improve the way LLMs interpret and surface your content.

  1. Monitor AI Visibility

Understanding how your brand appears in AI-generated outputs is crucial:

  • Track Mentions Across AI Tools: Regularly review how platforms like ChatGPT or Google’s AI Overviews reference your brand.
  • Adjust Based on Insights: Correct inaccuracies and fill content gaps to shape how AI represents your brand.

Consistent monitoring helps refine your strategy and maintain presence in the AI-first search ecosystem.

By taking these steps, marketers can successfully adapt to the rise of LLM-powered search, ensuring their content remains visible, credible, and aligned with AI-driven discovery.


How Can KIVA Help You to Adapt to LLMs Crawling Web Search?

KIVA, AI SEO Agent aligns your content with how AI agents actually interact with the web—so you’re not just ranked, you’re included.

kiva-feature-image

  • LLM Optimization helps you identify what queries ChatGPT, Perplexity, and Claude prioritize.

Content Briefs offer LLM-friendly outlines with suggested patterns, semantic structure, and community insights.

Content Creator generates drafts built for retrieval—optimized for how LLMs parse, cite, and synthesize content.



FAQs

LLMs reduce reliance on search result pages by answering queries directly. This lowers click-through traffic and forces marketers to optimize content for inclusion in AI-generated answers—not just for traditional search rankings.

SEO focuses on ranking in search engine results. GEO is about making your content structured, retrievable, and usable by LLMs so that it can appear in AI-generated outputs—even when no links are shown.

Yes. Many LLMs retrieve public content from the web unless explicitly blocked via robots.txt or infrastructure rules. Some platforms like Cloudflare have begun enforcing these boundaries, sparking debate over consent and access.

LLMs favor original, structured content that is concise, source-backed, and clearly written. Use schema markup, clear headings, and straightforward language to make your content more likely to be retrieved and cited.


What Comes Next for Web Discovery in the Age of GEO?

Generative search isn’t a side feature anymore—it’s the front door. And as LLMs become the default interface for finding, filtering, and delivering information, traditional discovery models are being rewritten.

Generative Engine Optimization (GEO) is no longer optional. It’s the strategic layer above SEO—focused not just on ranking in search, but on being retrieved, cited, and trusted by AI systems.

Web discovery is shifting from clicks to content availability in LLM pipelines. The winners won’t be the loudest—they’ll be the most usable, the most parsable, and the most aligned with how machines now evaluate relevance.

Key Takeaways for the Impact of LLMs on Web Search 

  1. Inclusion > Indexing Being crawled isn’t enough. You need to be structured, trusted, and retrievable to appear in generative answers.
  2. Optimize for Retrieval, Not Just Ranking Use clear headers, concise answers, structured formats, and up-to-date metadata to improve machine usability.
  3. Monitor AI Surfacing Track where and how your content appears in LLMs. Tools will emerge to help audit inclusion across ChatGPT, Perplexity, Claude, and others.
  4. Build Machine-Friendly Authority LLMs reward clarity, originality, and specificity. Publish source material, expert quotes, data-backed insights—not vague summaries.
  5. Expect a Pay-to-Play Future As models scale, access may be gated. Partnerships, opt-ins, or paid access to your content may determine whether you’re in or out.
  6. GEO Complements SEO Don’t abandon traditional optimization—it feeds the index that generative models still rely on. But now, it’s the floor, not the ceiling.