The explosive growth of generative AI faces a pivotal legal reckoning, as a European Parliament-commissioned report concludes that training models on copyrighted books, art, and music without compensation violates copyright law—debunking the "fair use" defense embraced by tech firms. Authored for the Committee on Legal Affairs, the research asserts that current EU text-and-data mining exceptions were never designed for AI's synthetic outputs, risking systemic copyright erosion unless urgently addressed.

Why the 'Human Learning' Analogy Falls Short

Tech companies often justify scraping copyrighted material by comparing it to a student reading a book. The study categorically rejects this:

"While it is often suggested that AI systems 'learn' in ways similar to humans... this analogy is misleading from a legal perspective. When generative AI models are trained on protected content, they typically make copies and process the actual expressions found in those works. This goes beyond what is permitted under current legal exceptions."

Crucially, AI lacks human understanding, merely following statistical patterns without engaging meaning—a distinction philosopher Luciano Floridi emphasizes and one that carries weight in copyright law. Unlike human cognition, AI replication creates derivative datasets that directly compete with original works.

Proposed EU Overhaul: Pay Creators and Redefine Protections

The paper advocates a dual approach:
1. A new statutory exception specifically for GenAI training under EU law.
2. An unwaivable right to equitable remuneration for creators whose works fuel these systems, ensuring artists and writers receive payment.

It also clarifies that purely AI-generated outputs should remain unprotected, while human-AI collaborations need standardized safeguards—a nod to mounting disputes over ownership in AI-assisted content.

Global Ripples and Mounting Legal Fires

This EU intervention intensifies pressure worldwide. In the U.S., ousted Copyright Office chief Shira Perlmutter recently argued fair use doesn’t cover commercial-scale scraping for competitive AI content. Her stance mirrors ongoing lawsuits, including Disney and Universal’s claim that Midjourney engages in "bottomless pit of plagiarism" by replicating iconic characters. As courts grapple with these cases, developers face a fragmented landscape: stricter EU rules could force costly licensing or technical pivots, while U.S. outcomes remain uncertain.

For AI builders, the implications are stark. Reliance on unattributed training data now carries legal and financial peril, potentially slowing innovation or pushing models toward licensed datasets. Content creators, long sidelined in the AI boom, may finally gain leverage—but only if policymakers act. As regulatory fault lines deepen, the era of unchecked data scraping appears numbered.

Source: The Register, Lindsay Clark, July 14, 2025