Amid the escalating copyright disputes in the AI field, traditional knowledge authorities are also losing their composure. This Friday, the globally renowned Encyclopedia Britannica and its subsidiary Merriam-Webster officially filed a lawsuit with the court, accusing OpenAI of unauthorized use of their copyrighted materials for "massive" AI model training.
This is another significant legal action following last year's lawsuit by the two institutions against the AI search company Perplexity. According to the complaint, Encyclopedia Britannica claims that OpenAI illegally copied nearly 100,000 online articles, encyclopedia entries, and dictionary definitions from them to train its GPT series large models.
"Draining" traffic and near-word-for-word "plagiarism"
The plaintiff listed multiple examples in the lawsuit, pointing out that ChatGPT generates copies almost identical to the content of Encyclopedia Britannica when answering user queries. More concerning for publishers is that AI-generated content summaries directly address users' questions in the chat interface, leading to severe "drainage" of traffic that originally belonged to the encyclopedia website, directly harming its traffic-dependent revenue model.
Fake Attribution: A New Allegation Under the Lanham Act
In addition to copyright infringement, this lawsuit also cites trademark provisions under the Lanham Act. The plaintiffs allege that ChatGPT sometimes fabricates facts (the so-called "hallucination" phenomenon) and incorrectly claims these pieces of information come from Encyclopedia Britannica. Such misleading behavior not only harms the encyclopedia's authoritative reputation but also leads the public to mistakenly believe that the use of its content has been officially authorized or endorsed.
The Future of the AI Industry Amid the Legal Storm
Currently, AI giants such as OpenAI and Anthropic are facing a wave of lawsuits from authors, publishers, and news organizations. Although some judges previously considered AI training to have "transformative" characteristics, the use of pirated materials was still deemed illegal. For example, Anthropic once paid a $1.5 billion settlement for using pirated e-books to train its models.
As traditional knowledge authorities are now taking legal actions, the "black box" operations of generative AI companies, which have long refused to disclose the sources of their training data, are facing unprecedented challenges. The outcome of this lawsuit will directly determine the power boundaries between the future AI industry and traditional copyright holders.


