Legal

Zuckerberg caught illegally pirating?

Meta CEO, Mark Zuckerberg, has been accused of allowing his team to use pirated books to train his AI model, Llama

Amanda Greenwood
January 10, 2025

New evidence in the copyright infringement lawsuit, Kadrey vs Meta (originally filed in 2023 by comedian Sarah Silverman, author Ta-Nehisi Coates, and others, accusing Meta of using their work to train their models, without permission), has alleged that Meta was training older versions of its Large Language Model, Llama, on pirated, copyrighted books, and CEO, Mark Zuckerberg, not only knew about it, he approved it.

Evidence suggests that Meta was using the “links aggregator” platform, LibGen—which provides users with PDF copies of published, copyrighted books, that can be downloaded for free—to train its models, even though the company had been sued and fined several times by publishers—including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education—for this illegal activity.

Despite the Llama development team expressing concern over using a “data set we know to be pirated,” and flagging that its use “may undermine Meta’s negotiating position with regulators,” Zuckerberg ignored their worries, and approved the use of pirated material to train Llama.

Not only that, but evidence also showed that Meta engineer—Nikolay Bashlykov—was also asked to write a script to remove all copyright information, including the words “copyright” and “acknowledgments,” from all e-books downloaded and used for training, from LibGen.

Meta is defending this activity, by claiming they were working under the Fair Use disclosure in US Law, which dictates that copyrighted material can be used, as long as it’s used to make something new and “sufficiently transformative”