What just happened? It’s not just OpenAI that keeps being sued for allegedly scraping content to train its systems. Reddit is suing Anthropic over claims it scraped content generated by forum users to train its Claude chatbot without consent.
In a complaint filed in San Francisco this week, Reddit claims that Anthropic intentionally trained its LLMs on content created by Reddit users without requesting consent, thereby violating Reddit’s user agreement.
The filing adds that Anthropic accessed Reddit more than 100,000 times, despite the AI firm stating that its bots had been blocked from scraping data from the site. It adds that Anthropic has been training its Claude chatbot on Reddit data since at least December 2021. The suit actually includes a screenshot appearing to show Claude acknowledging that it was trained on Reddit user data.
Reddit signed agreements with Google and OpenAI in 2024, worth $60 million and $70 million per year respectively, that allow them access to posts created by Reddit’s 100+ million daily users. This allows the posts to appear in answers provided by the companies’ respective chatbots. But Anthropic allegedly prefers not to pay for this sort of thing.
“Anthropic refused to engage,” the complaint states. “Thus, while other AI giants have entered into licensing agreements and agreed to respect users’ choices (including by deleting posts that Redditors chose to delete), Anthropic has not.”
Reddit challenges Anthropic’s assertion that it is “the white knight of the AI industry.”
“This case is about the two faces of Anthropic: the public face that attempts to ingratiate itself into the consumer’s consciousness with claims of righteousness and respect for boundaries and the law, and the private face that ignores any rules that interfere with its attempts to further line its pockets.”
Reddit CEO Steve Huffman previously complained that blocking companies unwilling to pay for data harvesting has been “a real pain in the ass,” which is why it changed its robots.txt file to exclude bots and crawlers that didn’t have permission to access its data.
Anthropic said that it disagrees with Reddit’s claims and will defend itself vigorously.
A Reddit spokesperson said, “We believe in the Open Internet – that does not give Anthropic the right to scrape Reddit content unlawfully, exploit it for billions of dollars in profit, and disregard the rights and privacy of our users.”
“This isn’t a misunderstanding, it’s a sustained effort to extract value from Reddit while ignoring legal and ethical boundaries.”
“We’re filing this lawsuit in line with our Public Content Policy and as our final option to force Anthropic to stop its unlawful practices and abide by its claimed values.”
Anthropic is no stranger to lawsuits. In 2024, writers and journalists Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson filed a class-action against the firm over claims it used pirated copies of their work to train Claude.
Anthropic has also been fighting a case against Universal Music that started in 2023. The AI firm is alleged to have used lyrics from at least 500 songs by musicians, including Beyoncé, the Rolling Stones and the Beach Boys, to train Claude without permission. In March, a judge rejected a preliminary bid from the music publishers to block their lyrics from being used this way.
Reddit vs. Anthropic: An Expert’s take on the AI Data Scraping Lawsuit
Is AI Data Scraping Ethical? Legal? An Expert Weighs In on the Reddit Lawsuit Against Anthropic
The debate around AI data scraping is heating up, and the recent lawsuit filed by Reddit against Anthropic is a prime example. we sat down with Dr. Evelyn Hayes, a leading expert in AI ethics and intellectual property law, to unpack the details of this case and its implications for the future of AI progress.
Time.news Editor: Dr. Hayes, thank you for joining us. Let’s dive straight in. Reddit is suing Anthropic for allegedly scraping its data to train the Claude chatbot without permission. Can you break down the core of this lawsuit for our readers?
Dr. Evelyn Hayes: Certainly. The heart of the matter is that Reddit claims Anthropic intentionally used Reddit user-generated content to train its large language models (llms) without obtaining consent or a licensing agreement. Reddit argues this violates its user agreement and represents a blatant disregard for the platform’s data policies.
Time.news editor: The article mentions Reddit has licensing agreements with Google and OpenAI, allowing them to use Reddit data. Is this a standard practice? Why didn’t Anthropic pursue a similar path?
Dr. Evelyn Hayes: Yes, licensing agreements are becoming increasingly common as AI companies recognize the value and importance of high-quality training data. Reddit’s agreements with Google and OpenAI, valued at millions, demonstrate this. According to the complaint, Anthropic simply “refused to engage” in licensing discussions, possibly preferring to take a riskier approach. This lawsuit suggests that Reddit is taking a firm stand against unauthorized AI data scraping.
Time.news Editor: The lawsuit claims anthropic accessed Reddit over 100,000 times after supposedly being blocked. That sounds quite deliberate. What are the potential legal ramifications for Anthropic if these claims are proven true?
Dr. Evelyn hayes: If proven true, Anthropic could face significant financial penalties, including damages for copyright infringement and breach of contract. More broadly,it could set a precedent that significantly hinders unauthorized AI data scraping,perhaps forcing other AI developers to rethink their data acquisition strategies and budget for licensing agreements.
Time.news Editor: Reddit’s statement is quite strong, accusing anthropic of having “two faces” and prioritizing profit over ethical considerations.Is this just legal rhetoric, or does it reflect a broader tension within the AI industry?
Dr. Evelyn Hayes: I think it underscores a real tension. There’s increasing pressure on AI companies to act ethically and transparently, especially regarding their data sourcing. Companies claiming to be ethical leaders must be held to a higher standard.Reddit clearly feels Anthropic’s actions don’t align with their public image as the “white knight of the AI industry.”
Time.news Editor: The article also mentions that Anthropic is facing other lawsuits related to copyright infringement, including claims of using pirated books and song lyrics. Does this paint a concerning picture for the company?
Dr. Evelyn Hayes: It does. These multiple lawsuits suggest a pattern of questionable data practices. It raises questions about Anthropic’s internal compliance and its overall approach to respecting intellectual property rights. The AI data scraping debate is not isolated to Reddit; it’s impacting various creators.
Time.news Editor: What advice would you give to companies building AI models to avoid similar legal challenges?
Dr. Evelyn hayes: My advice is simple: prioritize ethical and legal compliance from the outset. This means:
Obtain explicit consent: Whenever possible, get permission to use data from individuals and organizations.
Secure licensing agreements: Negotiate and pay for the right to use copyrighted material.
Be clear: Clearly disclose the sources of data used to train your models.
Respect robots.txt: Adhere to website owners’ instructions regarding website crawling and AI data scraping.
Implement opt-out mechanisms: Allow users to easily remove their data from training datasets.
Time.news Editor: For our readers who are content creators or users of online platforms, what can they do to protect their work from unauthorized AI scraping?
Dr. Evelyn Hayes:
Understand platform policies: Familiarize yourself with the terms of service and data policies of the platforms you use.
Adjust privacy settings: Utilize available privacy controls to limit the visibility of your content.
Consider watermarking: Add watermarks to your images and text to make it more arduous for AI models to use your content without attribution.
Advocate for stronger regulations: Support policies that require AI companies to obtain consent before using data for training purposes.
Time.news Editor: Dr. Hayes, thank you for providing such valuable insights into this complex issue. It’s clear that the Reddit Anthropic lawsuit is a pivotal moment in the ongoing conversation about AI ethics, AI data scraping* and intellectual property rights.
