Anthropic's Breakthrough Discovery: Only 250 Malicious Files Can Hack Large AI Models

AIbase基地

Published inAI News · 4 min read · Oct 11, 2025

A key study jointly released by Anthropic, the United Kingdom Artificial Intelligence Safety Institute, and the Alan Turing Institute shows that just 250 poisoned files are sufficient to successfully implant a backdoor in a large language model (LLM), and the effectiveness of this attack is unrelated to the size of the model.

Challenging Conventional Wisdom: A Small Number of Poisoned Data Can Cause Model Failure

The research team tested various models with parameters ranging from 600 million to 13 billion, and found that even larger models trained with cleaner data required the same number of poisoned documents. This discovery overturns a long-standing assumption—that attackers need to control a specific proportion of the training data to compromise the model.

In the experiment, poisoned samples accounted for only 0.00016% of the entire dataset, yet were enough to damage the model's behavior. The researchers trained 72 models of different sizes and tested them using 100, 250, and 500 poisoned documents. The results showed that 250 documents were sufficient to reliably implant a backdoor in models of all sizes, and increasing to 500 did not bring additional attack effects.

Virus, Code (2)

Low-Risk Test: Backdoor Trigger Word "SUDO"

The researchers tested a "denial of service"-style backdoor: when the model encounters a specific trigger word "SUDO", it outputs a string of random, meaningless garbage. Each poisoned document contains normal text, followed by the trigger word, and then some meaningless text.

Anthropic emphasized that this backdoor represents a narrow and low-risk vulnerability, which only causes the model to generate meaningless code and does not pose a significant threat to advanced systems. It remains unclear whether similar methods can be used for more serious vulnerabilities, such as generating unsafe code or bypassing security mechanisms; early studies indicate that executing complex attacks is much more difficult.

The Necessity of Disclosure: Helping Defenders

Although publishing these results carries the risk of encouraging attackers, Anthropic believes that disclosing this information is beneficial to the entire AI community. They point out that data poisoning is an attack type where defenders can gain an advantage, as they can re-examine the dataset and the trained model.

Hacker's New Method: Creating Invisible Backdoor Malware Using OpenAI API

The Microsoft Security Team warned about a new type of malware called SesameOp that uses the OpenAI Assistants API for attacks. The software disguises legitimate cloud services as a covert command and control channel, allowing attackers to remain hidden in victims' systems. This attack method, first discovered in July 2025, highlights new risks of malicious use of cloud services.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Anthropic's Breakthrough Discovery: Only 250 Malicious Files Can Hack Large AI Models

AIbase基地

Challenging Conventional Wisdom: A Small Number of Poisoned Data Can Cause Model Failure

Low-Risk Test: Backdoor Trigger Word "SUDO"

The Necessity of Disclosure: Helping Defenders

This article is from AIbase Daily

AI News Recommendations

New Approach to Universal Computing Power: How Yinfu Cloud Lowers the Barrier for AI Development through K8S Native Architecture

China's First Accessible AI Reading Companion System - Star AI Reading Assistant, Xiaoxing Assists Visually Impaired Children in Reading

Meta Launches DreamGym Framework to Make AI Agent Training More Efficient and Safe

Google Gemini 3 Pro Model Launches on AI Studio, Developers Can Flexibly Adjust Parameters

Yang Likun Criticizes LLMs: Meta AI's Strategy is on the Wrong Track

OpenAI Launches GPT-5.1: A Faster, More Accurate, and More Human-Like Personal AI Assistant

Hacker's New Method: Creating Invisible Backdoor Malware Using OpenAI API

World's First Embodied Intelligence Open Platform Launches! 3D Digital Humans Now Ready to Use Out of the Box: Mofa Xingyun Integrates Large Models into Hundreds of Yuan Chips

Meta Researchers Uncover the Black Box of Large Language Models and Fix AI Reasoning Flaws

MiniMax Open-Source M2 Model: High-Performance AI Empowers Coding and Proxy, Cost is Only 8% of Competitors

GEO Services