Perplexity Accused of Secretly Crawling Prohibited Website Content

AIbase基地

Published inAI News · 5 min read · Aug 5, 2025

According to the latest research report from internet infrastructure provider Cloudflare, AI startup Perplexity is accused of ignoring explicit blocking instructions when scraping website content. Cloudflare stated that they observed Perplexity hiding its identity when attempting to scrape web pages, thereby bypassing the website's preferences.

perplexity

Image source note: The image is AI-generated, and the image licensing service is Midjourney

Artificial intelligence products such as Perplexity typically rely on collecting large amounts of data from the internet, and these startups have long been scraping text, images, and videos without permission to support the operation of their products. In recent years, many websites have used standard Robots.txt files to address this issue, which indicate to search engines and AI companies which pages can be indexed and which cannot. However, these efforts have not been very effective.

According to Cloudflare's analysis, Perplexity appears to bypass these restrictions by changing its robot's "user agent." A "user agent" is a signal used to identify the device and type of version of the website visitor. Cloudflare also mentioned that Perplexity changed its autonomous system network (ASN), a digital identifier that identifies large networks on the Internet. Cloudflare observed this behavior across tens of thousands of domains and millions of requests, successfully identifying this crawler by combining machine learning and network signals.

Jesse Dwyer, a spokesperson for Perplexity, refuted Cloudflare's accusations, calling their blog post "salesmanship." He added that the screenshots in the article did not show any access to content. He further claimed that the crawler mentioned by Cloudflare was not owned by them. Cloudflare stated that they initially noticed these issues due to customer complaints that Perplexity was still scraping their website content, even though these websites had blocked the crawler through Robots files.

Cloudflare's analysis shows that Perplexity not only used its declared user agent, but also used a general browser that simulated Google Chrome when it was blocked. Finally, Cloudflare decided to remove Perplexity's crawler from its verification list and take new technologies to block its activities.

Notably, Cloudflare has recently taken a stance against AI crawlers and launched a marketplace that allows website owners to charge AI crawlers accessing their websites. Cloudflare's CEO Matthew Prince has warned that AI is disrupting the business models of the Internet, especially the revenue models of publishers. This is not the first time Perplexity has faced allegations of unauthorized scraping; media outlets such as Wired magazine have previously accused Perplexity of copying their content.

Key points:
🌐 Cloudflare accuses Perplexity of ignoring website blocking instructions when scraping content.
🤖 Perplexity attempts to bypass website protection measures by changing user agents and network identifiers.
📉 Cloudflare launched a marketplace allowing websites to charge AI crawlers, to protect website content.

Robots Gain a General Brain: RoboScience Releases Visics Large Model, Achieving Cross-Scenario Autonomous Execution

Robots are moving towards general embodied intelligence. RoboScience released the large model Visics, which uses the VLOA architecture, breaking the repetitive training mode of single tasks and achieving universal operation capabilities across different bodies, objects, and tasks, leading the industry to move away from the 'action replication' fragmented era.

Breaking the Barrier of Multimodal Switching! Google Brings Native Computer Operations into Gemini 3.5 Flash

Google DeepMind integrates native computer use capabilities into Gemini 3.5 Flash. Developers can now use a single model for building autonomous AI agents that operate across browsers, phones, and desktops. This eliminates context switching between models, streamlining long-running cross-platform tasks.....

Google DeepMind Invests $75 Million in A24: AI Enters the Hollywood Independent Film Industry

Google DeepMind invests $75M to partner with indie studio A24, co-developing AI filmmaking tools from project inception. This pioneering collaboration between a tech giant and top creators aims to build new AI capabilities for filmmakers. A24 is known for hits like 'Everything Everywhere All at Once.'....

Demonstrate Once, Repeat Infinite Times - OpenAI Codex Unlocks New Skills for Recording and Playback

OpenAI introduces "Record & Replay" feature for its Codex app on macOS. Users demonstrate a workflow once (e.g., uploading a YouTube video with metadata) which is recorded as a reusable skill. The AI then autonomously executes it repeatedly, advancing white-collar automation and AI agent implementation.....

Getty Images Collaborates with OpenAI, Official Image Library Content Integrated into ChatGPT Search Scenarios

Getty Images partners with OpenAI to integrate its licensed image library into ChatGPT's search and content discovery. Following a prior collaboration with NVIDIA on copyright-safe AI generation tools trained on Getty’s own content, the deal aims to advance lawful AI content use while preventing infringement.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Perplexity Accused of Secretly Crawling Prohibited Website Content

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Robots Gain a General Brain: RoboScience Releases Visics Large Model, Achieving Cross-Scenario Autonomous Execution

Breaking the Barrier of Multimodal Switching! Google Brings Native Computer Operations into Gemini 3.5 Flash

New Turning Point in the Computational Power Battle: OpenAI Teams Up with Broadcom to Launch the First Self-Developed Inference Chip, Jalapeño

Google DeepMind Invests $75 Million in A24: AI Enters the Hollywood Independent Film Industry

Samsung Electronics Globally Promotes ChatGPT and Codex to Enhance Employee Work Efficiency

AI Pioneers' Mass Migration: Nobel Laureate John Jumper Leaves DeepMind to Join Anthropic

Rumors: Anthropic Plans to Integrate Apple Digital ID for User Identity Compliance Verification

Demonstrate Once, Repeat Infinite Times - OpenAI Codex Unlocks New Skills for Recording and Playback

Getty Images Collaborates with OpenAI, Official Image Library Content Integrated into ChatGPT Search Scenarios

Visual Feast Integrates AI: Getty Images and OpenAI Reach Strategic Licensing Agreement

AI News Recommendations

Robots Gain a General Brain: RoboScience Releases Visics Large Model, Achieving Cross-Scenario Autonomous Execution

Breaking the Barrier of Multimodal Switching! Google Brings Native Computer Operations into Gemini 3.5 Flash

New Turning Point in the Computational Power Battle: OpenAI Teams Up with Broadcom to Launch the First Self-Developed Inference Chip, Jalapeño

Google DeepMind Invests $75 Million in A24: AI Enters the Hollywood Independent Film Industry

Samsung Electronics Globally Promotes ChatGPT and Codex to Enhance Employee Work Efficiency

AI Pioneers' Mass Migration: Nobel Laureate John Jumper Leaves DeepMind to Join Anthropic

Rumors: Anthropic Plans to Integrate Apple Digital ID for User Identity Compliance Verification

Demonstrate Once, Repeat Infinite Times - OpenAI Codex Unlocks New Skills for Recording and Playback

Getty Images Collaborates with OpenAI, Official Image Library Content Integrated into ChatGPT Search Scenarios

Visual Feast Integrates AI: Getty Images and OpenAI Reach Strategic Licensing Agreement