Nvidia Paid ‘Tens of Thousands’ for Pirated Books After Being Warned They Were Illegal

NVIDIA Corp. (Nasdaq: NVDA) contacted a controversial online repository of pirated books to obtain high-speed access to copyrighted material for training artificial intelligence models, according to internal company documents filed in federal court.

The correspondence, revealed in an amended complaint filed January 16 in US District Court for the Northern District of California, shows an Nvidia data strategy team member wrote to Anna’s Archive stating the company was “exploring including Anna’s Archive in pre-training data for our LLMs,” or large language models.

Anna’s Archive warned Nvidia about the illegal nature of its collections, according to the court filing. The pirate library asked whether the company had obtained internal authorization before proceeding, noting it had “wasted too much time on people who could not get internal buy-in.”

Within a week of receiving the warning about the illegal nature of the materials, Nvidia management gave “the green light” to proceed, the complaint alleges. Anna’s Archive then offered access to approximately 500 terabytes of data, which included millions of copyrighted books.

The shadow library charged tens of thousands of dollars for high-speed access to its collections, according to court documents.

Five authors — Abdi Nazemian, Brian Keene, Stewart O’Nan, Andre Dubus III and Susan Orlean — filed the expanded class-action lawsuit. The authors claim Nvidia used their copyrighted works without permission to train AI models including NeMo Megatron and Nemotron-4.

The lawsuit also alleges Nvidia downloaded copyrighted material from other shadow libraries including LibGen, Sci-Hub and Z-Library. Additionally, the complaint claims NVIDIA provided scripts and tools that allowed corporate customers to automatically download datasets containing pirated books.

Nvidia previously trained its AI models on the Books3 dataset, which contains approximately 196,640 books copied from the pirate site Bibliotik, according to the complaint. Books3 forms part of a larger dataset called The Pile.

The chip manufacturer defends its actions as fair use under copyright law. Nvidia has argued that AI training on copyrighted material differs from traditional copying because the models use books as statistical data rather than reproducing them directly.

The case marks the first time correspondence between a major US technology company and Anna’s Archive has been publicly revealed in court proceedings, according to copyright news site TorrentFreak, which first reported the internal emails.

The authors seek statutory damages, actual damages and compensation for what they describe as willful copyright violations. Hundreds of additional authors whose works appear in the pirated libraries could join the class-action suit.

Anna’s Archive describes itself as a preservation project aiming to catalog all books in existence and make them freely available. Copyright holders and publishers characterize the site as a piracy operation that undermines intellectual property rights.

Other major AI companies including Meta and Anthropic have also faced lawsuits alleging they trained models on pirated books from shadow libraries.

The case is Nazemian et al. v. NVIDIA Corporation, Case No. 4:24-cv-01454-JST, in the US District Court for the Northern District of California.



Information for this story was found via the sources and companies mentioned. The author has no securities or affiliations related to the organizations discussed. Not a recommendation to buy or sell. Always do additional research and consult a professional before purchasing a security. The author holds no licenses.

Video Articles

Why Silver’s Next Move May Be Built on a Much Stronger Base | Mani Alkhafaji – First Majestic Silver

Guanajuato Silver Q1 Earnings: They Finally Post Positive Net Income

We’re in a New Era of Gold Price Discovery | Ryan King – Equinox Gold

Recommended

Mercado Minerals Drills 1,120 g/t Silver Equivalent Over 1.20 Metres At Copalito

Goliath Resources Targets Expansion, Motherlode Source in 50,000 Metre Surebet Drill Program

Related News

FTC Probes AI Chatbot Companies Over Child Safety Concerns

The Federal Trade Commission issued orders Thursday to seven major technology companies, seeking information about...

Friday, September 12, 2025, 01:13:00 PM

Google Considers Nuclear Power for AI Data Centers

Google (Nasdaq: GOOG) CEO Sundar Pichai has hinted at the possibility of using nuclear energy...

Sunday, October 6, 2024, 07:39:00 AM

Sports Illustrated Published Content from Fake Authors with AI-Generated Profile Photos and Bios

Sports Illustrated has removed several articles from its website following a story from Futurism that...

Tuesday, November 28, 2023, 11:43:00 AM

AI Lawyer Backs Out From First Court Defense After State Bars Threatened Jail

An AI lawyer was supposed to be the first robot to defend its case in...

Monday, January 30, 2023, 12:24:00 PM

Ex-Google CEO Is Building AI-Powered Attack Drones

Former Google CEO Eric Schmidt is making the shift to artificial intelligence — by funding...

Friday, January 26, 2024, 12:09:00 PM