AI Copyright Lawsuit: Authors vs. Chatbots

by Priyanka Patel

California federal court became the site of a legal challenge Monday as a group of U.S. reporters, including New York Times reporter John Carreyrou, sued several leading artificial intelligence companies over the alleged use of copyrighted books to train their AI systems.

Authors Accuse AI firms of copyright Infringement

The lawsuit alleges that tech companies bypassed licensing agreements and author compensation by utilizing pirated literary works in the development of their large language models (LLMs).

  • The lawsuit names OpenAI, Google, Elon Musk’s xAI, Meta Platforms, and Perplexity as defendants.
  • Plaintiffs claim AI companies accessed copyrighted material through “shadow libraries” like LibGen and Z-Library.
  • The writers are intentionally avoiding a class-action lawsuit, preferring individual jury assessments of their claims.
  • This marks the first lawsuit of its kind to name xAI as a defendant.

Carreyrou, known for his reporting in “Bad Blood,” filed the suit alongside five other writers, accusing the companies of outright piracy to power the AI chatbots that are rapidly changing the digital landscape. The complaint states, “This case concerns a straightforward and purposeful act of theft that constitutes copyright infringement.”

The writers allege the AI companies didn’t just stumble upon these books; they actively sought out pirated copies through online repositories such as LibGen, Z-Library, and OceanofPDF. These illicit copies were then reportedly integrated into the AI systems to accelerate their development, impacting perhaps hundreds of authors, including Pulitzer Prize winners and bestselling novelists.

Did you know? – Large language models (LLMs) require massive datasets to function, often consisting of billions of words. This need for data has raised concerns about copyright and fair use.

What’s the difference between this case and others? Unlike other ongoing copyright cases, these writers are deliberately forgoing a class-action approach. They argue that class-action settlements often result in minimal payouts for authors, allowing defendants to resolve claims quickly and cheaply.

“LLM companies should not be able to so easily extinguish thousands upon thousands of high-value claims at bargain-basement rates,” the complaint asserts. The plaintiffs believe individual assessments by a jury are crucial to accurately reflect the extent of the alleged infringement.

The lawsuit comes on the heels of a meaningful settlement in August, where Anthropic agreed to pay $1.5 billion to authors claiming similar copyright violations. Though, the plaintiffs in the current case contend that individual class members in that settlement will receive a mere 2% of the potential $150,000 per infringed work allowed under the Copyright Act – a “tiny fraction,” as they describe it.

Pro tip – Copyright law protects original works of authorship, including literary works. Using copyrighted material without permission, even for AI training, can constitute infringement.

AI firms are increasingly arguing that their systems generate new,transformative outputs rather than simply reproducing original works. A previous ruling did find Anthropic’s use of copyrighted books for AI training to be fair use, but also steadfast the company violated copyright law by storing millions of pirated books in a central database, regardless of whether they were ultimately used for training.

Leave a Comment