In the long-running copyright infringement lawsuit between The New York Times and OpenAI, the case has seen significant progress. According to Ars Technica, the federal judge presiding over the case has authorized The New York Times and its co-plaintiffs, The New York Daily News and the Center for Investigative Reporting, to access OpenAI's user logs, including deleted content, to accurately determine the scope of the infringement.
The New York Times believes that ChatGPT users may delete their history after bypassing the paywall, making it necessary to conduct a broad data sweep. The newspaper further claims that the search results from these logs could become key evidence in the entire lawsuit: OpenAI's large language models (LLMs) not only trained on its copyrighted materials but may have also directly copied this content. This order was issued last month and was confirmed this week after OpenAI attempted an appeal.
OpenAI is deeply dissatisfied with this. Last month, the company claimed that this order would force it to circumvent "longstanding privacy norms." After the latest ruling was announced, an OpenAI spokesperson told Ars that they intend to "continue fighting."
Notably, this ruling comes as publishers like The New York Times negotiate with OpenAI on how to handle database searches. As OpenAI stated in a statement last month, the order covers everything from free ChatGPT logs to more sensitive information from users of its API. (The order specifically noted that logs from ChatGPT Enterprise and ChatGPT Edu, its models customized for universities, will be unrestricted.)
Beyond seeking evidence of copyright infringement, OpenAI's log strategy may also help prove that ChatGPT dilutes the news market by summarizing articles within the chatbot, ultimately leading to losses in advertising revenue for media organizations as their links are completely bypassed by potential readers. According to Forbes, earlier this year, the content licensing platform TollBit found that chatbots from OpenAI, Google, and other companies sent 96% less traffic to publishers than traditional search engines — a trend that has started to harm the news industry.
In this "struggle for survival" between content providers and artificial intelligence, evidence of market dilution could tip the scales in favor of copyright holders, as a judge told publishers suing Anthropic last month — a development that will have significant implications for any user attempting to bypass paywalls.