NVIDIA Corp. (Nasdaq: NVDA) contacted a controversial online repository of pirated books to obtain high-speed access to copyrighted material for training artificial intelligence models, according to internal company documents filed in federal court.
The correspondence, revealed in an amended complaint filed January 16 in US District Court for the Northern District of California, shows an Nvidia data strategy team member wrote to Anna’s Archive stating the company was “exploring including Anna’s Archive in pre-training data for our LLMs,” or large language models.
‼️NVIDIA allegedly contacted Anna's Archive directly for access to ~500 terabytes of "pirated" books and papers for pre-training their LLMs
— International Cyber Digest (@IntCyberDigest) January 22, 2026
After being warned by Anna’s Archive of the illegal nature of their collections, NVIDIA management gave "the green light" to proceed with… pic.twitter.com/xju6aoMN9W
Anna’s Archive warned Nvidia about the illegal nature of its collections, according to the court filing. The pirate library asked whether the company had obtained internal authorization before proceeding, noting it had “wasted too much time on people who could not get internal buy-in.”
Within a week of receiving the warning about the illegal nature of the materials, Nvidia management gave “the green light” to proceed, the complaint alleges. Anna’s Archive then offered access to approximately 500 terabytes of data, which included millions of copyrighted books.
The shadow library charged tens of thousands of dollars for high-speed access to its collections, according to court documents.
Five authors — Abdi Nazemian, Brian Keene, Stewart O’Nan, Andre Dubus III and Susan Orlean — filed the expanded class-action lawsuit. The authors claim Nvidia used their copyrighted works without permission to train AI models including NeMo Megatron and Nemotron-4.
The lawsuit also alleges Nvidia downloaded copyrighted material from other shadow libraries including LibGen, Sci-Hub and Z-Library. Additionally, the complaint claims NVIDIA provided scripts and tools that allowed corporate customers to automatically download datasets containing pirated books.
Nvidia previously trained its AI models on the Books3 dataset, which contains approximately 196,640 books copied from the pirate site Bibliotik, according to the complaint. Books3 forms part of a larger dataset called The Pile.
The chip manufacturer defends its actions as fair use under copyright law. Nvidia has argued that AI training on copyrighted material differs from traditional copying because the models use books as statistical data rather than reproducing them directly.
The case marks the first time correspondence between a major US technology company and Anna’s Archive has been publicly revealed in court proceedings, according to copyright news site TorrentFreak, which first reported the internal emails.
The authors seek statutory damages, actual damages and compensation for what they describe as willful copyright violations. Hundreds of additional authors whose works appear in the pirated libraries could join the class-action suit.
Anna’s Archive describes itself as a preservation project aiming to catalog all books in existence and make them freely available. Copyright holders and publishers characterize the site as a piracy operation that undermines intellectual property rights.
Other major AI companies including Meta and Anthropic have also faced lawsuits alleging they trained models on pirated books from shadow libraries.
The case is Nazemian et al. v. NVIDIA Corporation, Case No. 4:24-cv-01454-JST, in the US District Court for the Northern District of California.
Information for this story was found via the sources and companies mentioned. The author has no securities or affiliations related to the organizations discussed. Not a recommendation to buy or sell. Always do additional research and consult a professional before purchasing a security. The author holds no licenses.