According to Wired, several mainstream media outlets and platforms, including The New York Times, Reddit, and the parent company of USA Today, have recently officially blocked the Internet Archive's "Wayback Machine" tool. This move aims to prevent AI companies from indirectly scraping copyrighted content through this archive tool for model training.

Robot Hacker

The ironic situation of benefiting while blocking

Ironically, a recent in-depth report by USA Today exposing immigration policy statistics was only possible due to historical data preserved by the "Wayback Machine." However, the media group's spokesperson stated that they have completely blocked all crawlers (including ia_archiverbot) to address the growing risk of AI infringement.

Diverse restriction methods used by media organizations

At least 23 mainstream news websites have now taken restriction measures:

  • Complete block: The New York Times and Reddit directly blocked the dedicated crawler of the "Wayback Machine."

  • Interface filtering: Although The Guardian has not completely blocked crawlers, it has excluded its content from the Internet Archive's API and filtered the search interface, making it extremely difficult for users to access its historical archives.

In response to the blocking actions by publishers, more than 100 active journalists, including Rachel Maddow, have jointly written a letter of support to the Electronic Frontier Foundation (EFF). They believe that the "Wayback Machine" is an "indispensable tool" for fact-checking, tracking changes in the behavior of power institutions, and preserving digital history records.

Publishers argue that AI companies using the Internet Archive's vast data for training violates copyright law and competes directly with them. However, Mark Graham, director of the Internet Archive, pointed out that the continuous closure of public web content is seriously weakening society's ability to understand historical truths and conduct public oversight. If this trend continues, a large amount of early digital historical records may face the risk of complete loss.