Viqus Logo Viqus Logo
Home
Categories
Language Models Generative Imagery Hardware & Chips Business & Funding Ethics & Society Science & Robotics
Resources
AI Glossary Academy CLI Tool Labs
About Contact

Reddit Sues Perplexity for Alleged Industrial-Scale Data Scraping

Reddit AI Perplexity Data Scraping Legal OpenAI Tech Law
October 22, 2025
Viqus Verdict Logo Viqus Verdict Logo 8
Data Wars
Media Hype 7/10
Real Impact 8/10

Article Summary

Reddit has filed a lawsuit against Perplexity, SerpApi, Oxylabs, and AWMProxy, accusing them of engaging in industrial-scale data scraping to train their AI models. The lawsuit claims that Perplexity, a competitor, is using these data scraping companies to obtain Reddit’s vast trove of user-generated content, circumventing Reddit’s protections and ignoring previous cease-and-desist letters. Reddit argues that Perplexity’s ‘answer engine’ relies heavily on this stolen data, and that the defendants are engaging in deceptive practices to mask their identities and bypass security measures. The lawsuit highlights a growing trend of AI companies seeking to acquire large datasets for training purposes, often through questionable means. Reddit’s data, representing billions of posts and conversations, is considered incredibly valuable for AI model development. Reddit's earlier API changes were aimed at monetizing this data, but the data scraping companies are seen as ‘would-be bank robbers’ determined to steal this information. Perplexity contends that it respects Reddit’s robots.txt and only uses publicly available information, but the volume of Reddit citations on its platform has increased since the initial letter.

Key Points

  • Reddit is suing Perplexity and several data scraping companies for illegally obtaining its content to train AI models.
  • Reddit alleges Perplexity is using these scrapers despite a previous cease-and-desist letter and claims that Perplexity’s ‘answer engine’ relies on stolen data.
  • The lawsuit underscores a growing trend of AI companies aggressively seeking large datasets, often bypassing established security protocols.

Why It Matters

This lawsuit is a significant development in the ongoing battle between content platforms and AI developers. It highlights the ethical and legal challenges surrounding data acquisition for AI training and raises concerns about the potential for intellectual property infringement and the exploitation of user-generated content. This case has broader implications for the future of online content and how it’s used to train artificial intelligence, particularly as AI models become increasingly reliant on vast quantities of data. For professionals, this news underscores the increasing legal and regulatory scrutiny surrounding AI development and deployment, and the need for robust data governance policies.

You might also be interested in