Cloudflare released a report stating that the AI search startup Perplexity is suspected of circumventing restrictions aimed at preventing its web crawler from accessing certain websites. According to Cloudflare, when its crawler encounters blockage, Perplexity attempts to bypass website preferences by hiding its identity, including restrictions stated in robots.txt files and Web application firewall (WAF) rules.

Developers, hackers, vulnerabilities, attacks

The accusation has heightened concerns about Perplexity's unauthorized acquisition of content. Previously, Perplexity was criticized for breaking paywalls and ignoring robots.txt files, but its CEO Aravind Srinivas attributed the responsibility to third-party crawlers.

To verify customer complaints, Cloudflare set up new domains with similar access restrictions for testing. The results showed that Perplexity's crawler (initially named "PerplexityBot" or "Perplexity-User") immediately changed its user agent after being blocked, disguising itself as "Google Chrome running on macOS." Cloudflare stated that this "undisclosed crawler" also used rotating IP addresses and changed its autonomous system network (ASN) to bypass blocks. Cloudflare said it observed such evasion behavior involving "tens of thousands of domains and millions of requests per day."