Reddit Sues Perplexity for Alleged Industrial-Scale Data Scraping
8
What is the Viqus Verdict?
We evaluate each news story based on its real impact versus its media hype to offer a clear and objective perspective.
AI Analysis:
While the immediate media impact is high due to the involved parties and the AI narrative, the long-term impact will be felt across the entire tech industry as data governance and AI ethics become increasingly central to development.
Article Summary
Reddit has filed a lawsuit against Perplexity, SerpApi, Oxylabs, and AWMProxy, accusing them of engaging in industrial-scale data scraping to train their AI models. The lawsuit claims that Perplexity, a competitor, is using these data scraping companies to obtain Reddit’s vast trove of user-generated content, circumventing Reddit’s protections and ignoring previous cease-and-desist letters. Reddit argues that Perplexity’s ‘answer engine’ relies heavily on this stolen data, and that the defendants are engaging in deceptive practices to mask their identities and bypass security measures. The lawsuit highlights a growing trend of AI companies seeking to acquire large datasets for training purposes, often through questionable means. Reddit’s data, representing billions of posts and conversations, is considered incredibly valuable for AI model development. Reddit's earlier API changes were aimed at monetizing this data, but the data scraping companies are seen as ‘would-be bank robbers’ determined to steal this information. Perplexity contends that it respects Reddit’s robots.txt and only uses publicly available information, but the volume of Reddit citations on its platform has increased since the initial letter.Key Points
- Reddit is suing Perplexity and several data scraping companies for illegally obtaining its content to train AI models.
- Reddit alleges Perplexity is using these scrapers despite a previous cease-and-desist letter and claims that Perplexity’s ‘answer engine’ relies on stolen data.
- The lawsuit underscores a growing trend of AI companies aggressively seeking large datasets, often bypassing established security protocols.