Stanford researchers demonstrated that copyrighted books can be extracted verbatim from leading production language models, bypassing safety measures.

A team from Stanford University has demonstrated that copyrighted literary works can be extracted verbatim from production-grade language models, challenging assumptions about memorization safeguards. In their paper Extracting books from production language models, researchers Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, and Percy Liang revealed that four major commercial LLMs—Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3—remained vulnerable to training data extraction despite systemic protections.
The core issue stems from unresolved legal debates about whether LLMs memorize and reproduce copyrighted material during training. While earlier studies showed open-weight models could leak data, this research confirms production systems—which implement safety filters and alignment techniques—still retain significant verbatim content. The team developed a two-phase extraction method: First, a probe tested extraction feasibility, sometimes employing Best-of-N jailbreak techniques to bypass restrictions. Second, iterative continuation prompts extracted sequential text blocks, measured by nv-recall—a novel metric calculating the percentage of matching content blocks compared to source material.
Results varied significantly by model:
- Gemini 2.5 Pro and Grok 3 required no jailbreaking, yielding 76.8% and 70.3% nv-recall respectively for Harry Potter and the Sorcerer's Stone.
- Claude 3.7 Sonnet produced near-complete book outputs (95.8% recall) after jailbreaking.
- GPT-4.1 resisted extraction longest, requiring 20 times more jailbreak attempts than others and capping at 4.0% recall before refusing continuation.
These findings carry substantial implications. First, they validate copyright holders' concerns about unauthorized content reproduction. Second, they expose limitations in current safety mechanisms: Alignment techniques appear insufficient against determined extraction attacks. Third, the variance in vulnerability between models suggests architectural differences in memorization suppression. As one researcher noted, "Even state-of-the-art safeguards cannot eliminate extraction risk entirely."
The team disclosed findings to affected providers in late 2025 under a 90-day responsible disclosure window before publication. Their methodology offers a framework for evaluating memorization risks, potentially informing future model development. With ongoing copyright lawsuits against LLM developers, this research provides empirical evidence that could shape legal precedent and technical countermeasures. The full paper includes implementation details and quantitative comparisons across models.

Comments
Please log in or register to join the discussion