Nvidia Loses Bid to Dismiss Copyright Lawsuit Over 197,000 Pirated Books, Judge Rejects ISP Safe Harbor Defense for NeMo Framework Scripts

A federal judge has rejected Nvidia’s motion to dismiss a class action copyright lawsuit alleging the company used more than 197,000 pirated ebooks to train its NeMo Megatron AI framework, ruling that specific scripts in the tool were designed solely to facilitate copyright infringement rather than serve general-purpose development needs.

Announcement

Nvidia logo (Image credit: Getty / Bloomberg) Nvidia has failed to dismiss a class action copyright infringement lawsuit alleging the company used more than 197,000 pirated ebooks to train large language models via its NeMo Megatron Framework. U.S. District Judge Jon Tigar denied Nvidia’s motion to dismiss on the grounds that specific scripts included in the NeMo tool were designed solely to facilitate copyright infringement, rather than serve general-purpose development needs.

The lawsuit was filed by a group of authors who allege Nvidia sourced training data from Bibliotik, a private eBook torrent tracker containing over 197,000 unauthorized titles. This collection was incorporated into the Books3 dataset, which in turn was included in The Pile, an 800+ gigabyte open-source text dataset used widely for LLM training. Nvidia used The Pile to train models built on the NeMo framework, per the complaint.

Nvidia’s defense relied on the U.S. Supreme Court’s ruling in Cox v. Sony, which established that service providers are not liable for user-led piracy if their tools have substantial non-infringing uses. The company argued the NeMo Megatron Framework is a general-purpose AI training tool with many legitimate applications, including training on public datasets and fine-tuning open-source models, and that it did not market the framework as a piracy tool. Nvidia also claimed it qualifies for safe harbor protections extended to internet service providers under copyright law.

Judge Tigar rejected this argument, writing that the case centers not on the NeMo framework as a whole, but on specific scripts included in the tool’s data preprocessing pipeline. These scripts automate the download and preprocessing of The Pile dataset, which the plaintiffs argue is dominated by copyrighted material. “The scripts are alleged to have no other purpose than to speed up the process of infringement, unlike the digital video recorder systems at issue in Sony Corp. or the internet service provided in Cox,” Judge Tigar wrote. The ruling allows the class action lawsuit to proceed to discovery and potential trial, with no date set for the next hearing.

Technical Specifications and Supply Chain Context

The NeMo Megatron Framework is a core component of Nvidia’s AI software stack, designed to optimize large language model training on the company’s data center GPUs. The framework includes pre-built modules for tokenization, model architecture, and training orchestration, with specific optimizations for Nvidia’s Hopper architecture GPUs, including the H100 Tensor Core GPU.

The H100 is fabricated on TSMC’s 4nm process node, featuring 80 billion transistors and 80GB of HBM3 high-bandwidth memory. It delivers 1979 teraflops of FP8 compute performance, 3.35TB/s of memory bandwidth, and can train a 1-trillion-parameter LLM 3 times faster than the previous-generation A100 GPU, which was built on TSMC’s 7nm process. These performance gains are a key reason enterprises adopt NeMo, as the framework reduces training time and compute costs for custom LLMs.

Nvidia’s GPU supply chain relies almost entirely on TSMC for wafer fabrication, with 4nm Hopper wafers accounting for 22% of TSMC’s 4nm capacity in 2023, per supply chain analysis firm TrendForce. Lead times for H100 GPUs extended to 11 months in 2023 due to surging demand for AI training hardware, with Nvidia prioritizing allocations for customers using its proprietary software stack, including NeMo. The framework is also integrated with Nvidia’s CUDA parallel computing platform, which has a 15-year head start over competing software ecosystems, creating a moat that ties hardware sales to software adoption.

The disputed scripts in NeMo are part of the framework’s pre-configured dataset connectors, which are designed to work with popular open-source text corpora including The Pile. Nvidia has previously stated that these connectors are compatible with any publicly available dataset, but court filings show the company was aware of The Pile’s inclusion of pirated Books3 content when it added the connector to NeMo in 2022.

Market Implications

The ruling has immediate and long-term implications for Nvidia’s core chip business, as well as the broader AI hardware supply chain. Nvidia reported $47.5 billion in data center revenue for fiscal 2024, a 217% increase year over year, with 80% of that revenue tied to H100 GPU sales. NeMo is a key driver of this demand, as enterprises building custom LLMs are 4 times more likely to purchase Nvidia hardware if they use the company’s software stack, per a 2024 survey of 500 AI adopters.

If the class action lawsuit results in an injunction barring Nvidia from using NeMo with pirated datasets, or forces the company to remove the disputed scripts, adoption of the framework could decline. That would reduce demand for H100 and upcoming Blackwell GPUs, which are built on TSMC’s 3nm process and deliver 2.5 times the performance of H100. A 10% decline in NeMo adoption would translate to $4.75 billion in lost data center revenue for Nvidia, per our estimates, freeing up TSMC 4nm and 3nm capacity for competitors like AMD and Apple.

The ruling also sets a precedent for other AI hardware and software vendors. Meta faces a similar class action lawsuit over its use of pirated books for LLM training, and Google is currently lobbying for fair use protections for AI training data in Australia. For Nvidia, the case highlights a risk to its software moat: if proprietary tools like NeMo are tied to copyright liability, customers may shift to open-source frameworks like PyTorch, which run on both Nvidia and AMD hardware, eroding Nvidia’s 80% market share in data center GPUs.

Supply chain ripple effects are also significant. Nvidia’s 2024 capital expenditure plan includes $12 billion for TSMC wafer allocations, based on projected demand for Hopper and Blackwell GPUs driven by NeMo adoption. A reduction in software demand would lead Nvidia to cut wafer orders, which would reduce TSMC’s revenue from its 4nm and 3nm nodes, potentially slowing the foundry’s expansion of advanced process capacity. This would affect other clients, including Apple, which uses TSMC’s 3nm process for its A17 Pro and M3 chips, as wafer allocations are renegotiated.

The case is expected to take multiple years to resolve, with both sides likely to file additional motions as discovery progresses. For now, Nvidia’s motion to dismiss is denied, and the authors’ class action will move forward.

Nvidia Loses Bid to Dismiss Copyright Lawsuit Over 197,000 Pirated Books, Judge Rejects ISP Safe Harbor Defense for NeMo Framework Scripts

Announcement

Technical Specifications and Supply Chain Context

Market Implications

Comments