Publishers and Scott Turow Sue Meta Over Llama AI Training Data
#Regulation

Publishers and Scott Turow Sue Meta Over Llama AI Training Data

AI & ML Reporter
4 min read

Five major publishing houses and bestselling author Scott Turow have filed a class action lawsuit against Meta, alleging the company illegally used millions of copyrighted books to train its Llama AI language model, marking another significant legal challenge in the ongoing battle between content creators and AI developers.

The publishing industry has escalated its fight against AI companies with a new class action lawsuit targeting Meta and CEO Mark Zuckerberg. Filed in federal court in Manhattan, the lawsuit alleges that Meta illegally used millions of copyrighted works to train its Llama AI language system, potentially setting a precedent for how AI models are trained on existing creative content.

The plaintiffs—Elsevier, Cengage, Hachette Book Group, Macmillan, and McGraw Hill—accuse Meta of copyright infringement on an unprecedented scale. According to the complaint, Meta "reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law." The lawsuit specifically names Zuckerberg as having "personally authorized and actively encouraged the infringement."

This legal action represents a significant escalation in the ongoing conflict between content creators and AI developers. The publishing houses involved represent some of the largest players in the industry, and the inclusion of bestselling author Scott Turow adds significant weight to the case. Authors published by these companies include numerous high-profile names such as James Patterson, Donna Tartt, former President Joe Biden, and recent Pulitzer Prize winners Yiyun Li and Amanda Vaill.

Meta CEO Mark Zuckerberg arrives for a landmark trial over whether social media platforms deliberately addict and harm children, Wednesday, Feb. 18, 2026, in Los Angeles. (AP Photo/Ryan Sun, File)

Meta has responded defiantly, vowing to "fight this lawsuit aggressively" and asserting that "training AI on copyrighted material can qualify as fair use." This legal argument centers on the concept of fair use, which allows limited use of copyrighted material without permission for purposes such as criticism, comment, news reporting, teaching, scholarship, or research. However, the extent to which training large-scale AI models qualifies as fair use remains an unsettled question in legal circles.

The lawsuit comes at a time when the AI industry faces increasing scrutiny over data sourcing practices. In 2025, AI company Anthropic agreed to pay $1.5 billion to settle a similar class action suit initiated by thriller novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson. That settlement suggests that courts may be increasingly sympathetic to creators' concerns about unauthorized use of their work.

What's particularly notable about this case is its focus on Meta's Llama model, which has gained significant attention in the AI community. Unlike some other AI models that have faced legal challenges, Llama is notable for being open-source, which has contributed to its widespread adoption. However, this openness also makes the training data more transparent, potentially strengthening the plaintiffs' case.

The legal questions at the heart of this case are complex and could have far-reaching implications for the AI industry. Key issues include whether the scale of data used in training AI models transforms the nature of the use, whether the commercial nature of AI development weighs against fair use claims, and whether the availability of licensing options should influence fair use determinations.

This lawsuit is part of a broader pattern of legal challenges facing AI companies. From visual artists and photographers to news organizations and software developers, creators across multiple industries are increasingly asserting their rights in the face of AI's rapid advancement. The outcome of this case could establish important precedents for how AI companies source and compensate for the data used in their models.

For Meta, this represents another legal challenge in a year that has already seen the company facing scrutiny over its AI practices. The timing of the lawsuit, coming shortly after Meta's annual developer conference LlamaCon, suggests that the publishing industry may be timing its legal actions to maximize impact.

A Meta logo is shown on a video screen at LlamaCon 2025, an AI developer conference, in Menlo Park, Calif., April 29, 2025. (AP Photo/Jeff Chiu, File)

As the case progresses, it will be closely watched by both the AI and publishing industries. The legal battle highlights the fundamental tension between technological innovation and intellectual property rights—a tension that will likely continue to shape the development of AI in the coming years.

Meta has positioned itself as a leader in open-source AI development, with Llama models being widely used by researchers and developers. The company has argued that open-source AI promotes innovation and accessibility, but this lawsuit suggests that the company may need to address concerns about how these models are trained.

The publishing industry has been particularly vocal about AI's impact on its business models. Beyond the legal challenges, publishers are grappling with how AI might change reading habits, author compensation structures, and the fundamental nature of creative work. This lawsuit represents both a defensive action to protect existing copyrights and a proactive stance in shaping how AI interacts with creative content in the future.

Legal experts suggest that this case could take years to resolve, potentially reaching the Supreme Court. In the meantime, AI companies may face increasing pressure to develop more transparent and ethical approaches to data sourcing, while creators will continue to seek mechanisms to ensure they are fairly compensated for their work in the age of AI.

The outcome of this case will likely influence not just Meta's practices but the broader AI industry's approach to training data. As AI models become more sophisticated and widely deployed, the question of how to balance innovation with creators' rights will remain a central challenge for the technology sector.

Comments

Loading comments...