Generative AI has been controversial due to its lack of reliability, high energy consumption, and unauthorized use of copyrighted materials. However, a recent court case against the AI company Anthropic revealed a more shocking truth: the company had destroyed millions of physical books to train its AI assistant.

In this case, the judge found that Anthropic engaged in large-scale book destruction to build its language model Claude. The tech company purchased a large number of physical books and then digitized them by removing the bindings and scanning the pages, which not only completely destroyed these books but also had no intention of making the final digital versions public. This approach played an important role in the court's decision in favor of Anthropic. The judge considered this digital processing as sufficient transformation, which met the requirements of fair use.

However, despite Claude being able to generate unique content using these digitized books, critics point out that large language models may still copy content verbatim during training. Some of Anthropic's legal victories allow it to use copyrighted books for AI model training without notifying the original publishers or authors, which could eliminate a major obstacle facing the generative AI industry.

Reading, Books

Notably, a former executive of Metal stated that if AI had to comply with copyright laws, the entire industry might collapse overnight, as developers would find it difficult to obtain the massive data needed to train large language models. Ongoing copyright disputes also pose a significant threat to the development of this technology. Recently, the CEO of Getty Images admitted that the company cannot afford all the copyright infringement costs related to AI. Meanwhile, Disney's lawsuit against the image generation company Midjourney highlights the ability of image generators to copy copyrighted content, which could have a profound impact on the entire generative AI ecosystem.