ViqusViqus
Navigate
Company
Blog
About Us
Contact
System Status
Enter Viqus Hub

Perplexity Accused of Stealth Crawling – A Robots.txt Breach?

AI Search Engine Perplexity Robots.txt Web Crawling Cloudflare Internet Norms Data Scraping
August 04, 2025
Viqus Verdict Logo Viqus Verdict Logo 8
Protocol Breach
Media Hype 7/10
Real Impact 8/10

Article Summary

Perplexity AI is embroiled in controversy following claims from Cloudflare researchers that the company’s crawlers are employing deceptive tactics to access websites despite implemented protections like robots.txt files and Web application firewalls. Cloudflare alleges Perplexity utilizes stealth bots, rotating IP addresses, and diverse ASN connections to circumvent these defenses, a behavior that researchers argue violates decades-old internet protocols established through the Robots Exclusion Protocol. This practice, if confirmed, would undermine the fundamental principles of website governance and raise serious questions about data usage and copyright compliance. The allegations are not isolated; similar claims of plagiarism and suspicious traffic patterns have been leveled by Forbes and Wired, further amplifying concerns about Perplexity's data acquisition methods. Perplexity’s lack of response to these accusations only adds to the mounting skepticism. The situation highlights a growing tension between AI-driven search engines and the established mechanisms designed to protect website content. The potential legal and ethical ramifications of such behavior are substantial, and the incident underscores the need for greater transparency and accountability within the rapidly evolving landscape of AI search technologies.

Key Points

  • Perplexity AI is accused of using stealth bots to bypass website restrictions, specifically robots.txt files.
  • Researchers at Cloudflare discovered that Perplexity's crawlers rotate IP addresses and use multiple ASNs to evade website blocks.
  • The allegations highlight a potential violation of established internet norms and raises concerns about copyright infringement and data usage practices.

Why It Matters

This news is significant for several reasons. Firstly, it touches on the core ethical considerations surrounding AI development and deployment – specifically, the responsible use of data and respecting intellectual property rights. Secondly, it reveals a critical vulnerability in the established infrastructure of the internet, the robots.txt protocol, and casts a light on the potential for large-scale data scraping by AI-powered search engines. Finally, the lack of response from Perplexity fuels wider distrust and underscores the need for companies to proactively address concerns about their data acquisition methods. Professionals in cybersecurity, legal tech, and AI development need to closely monitor the situation to understand its potential implications for internet governance and the future of search.

You might also be interested in