Only 250 Documents! Surprising Discovery That AI Models Can Also Be Brainwashed

AIbase基地

Published inAI News · 4 min read · Oct 20, 2025

In a recent joint study, scientists from Anthropic, the UK AI Safety Institute, and the Alan Turing Institute revealed an alarming fact: large language models (such as ChatGPT, Claude, and Gemini) are far less resistant to data poisoning attacks than previously expected. The research shows that attackers only need to insert about 250 contaminated files to implant a "backdoor" in these models, changing how they respond. This finding has sparked a deep reflection on current AI safety practices.

The research team tested AI models of different sizes, with parameters ranging from 6 million to 1.3 billion. Shockingly, attackers could successfully control the model's output by adding only a tiny number of malicious files to the training data. Specifically, for the largest 1.3 billion parameter model, these 250 contaminated files accounted for just 0.00016% of the total training data. However, when the model received specific "trigger phrases," it might output nonsensical text instead of normal, coherent responses. This challenges the traditional belief that larger models are harder to attack.

Artificial intelligence brain, large model

Image source note: The image is AI-generated, and the image licensing service is Midjourney.

Researchers also tried retraining the model using "clean data" repeatedly, hoping to eliminate the impact of the backdoor, but the results showed that the backdoor still existed and could not be completely removed. Although this study mainly focused on simple backdoor behaviors and the tested models had not reached commercial levels, it definitely raises a warning about the security of AI models.

With the rapid development of artificial intelligence, the risk of data poisoning attacks has become particularly prominent. Researchers call on the industry to re-examine and adjust current security practices to better protect AI models. This discovery not only gives us new insights into AI security but also sets higher requirements for future technological development.

Anthropic Study: As Few As 250 Poisoned Files Can Easily Compromise Large AI Models

Anthropic, in collaboration with the UK Institute for AI Safety and other institutions, found that large language models are vulnerable to data poisoning attacks. As few as 250 poisoned files can be used to implant a backdoor. Testing showed that the effectiveness of the attack is not related to the model size (600 million to 13 billion parameters), highlighting the prevalence of AI security vulnerabilities.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Only 250 Documents! Surprising Discovery That AI Models Can Also Be Brainwashed

AIbase基地

This article is from AIbase Daily

AI News Recommendations

University of Pennsylvania Study Finds: The Ruder the AI is, the Higher the Accuracy Rate

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

AI Security Alert: Poisoning a Large Language Model Requires Just 250 Files

Anthropic Study: As Few As 250 Poisoned Files Can Easily Compromise Large AI Models

Anthropic's Breakthrough Discovery: Only 250 Malicious Files Can Hack Large AI Models

OpenAI confirms ChatGPT weekly active users have surpassed 800 million

Exceeding RAG DRAG Technology to Significantly Improve the Accuracy of Large Models

Meta Releases New Model CWM to Aid Code Understanding and Generation

Alibaba Cloud Launches New Security Guard Qwen3Guard, Aimed at Providing Reliable Security for Artificial Intelligence

AI Recruitment Unicorn Juicebox Raises $36 Million in Funding: A 4-Person Team Creates a Ten-Million ARR Miracle, Sequoia Leads the A Round

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Only 250 Documents! Surprising Discovery That AI Models Can Also Be Brainwashed

AIbase基地

This article is from AIbase Daily

AI News Recommendations

University of Pennsylvania Study Finds: The Ruder the AI is, the Higher the Accuracy Rate

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

AI Security Alert: Poisoning a Large Language Model Requires Just 250 Files

Anthropic Study: As Few As 250 Poisoned Files Can Easily Compromise Large AI Models

Anthropic's Breakthrough Discovery: Only 250 Malicious Files Can Hack Large AI Models

OpenAI confirms ChatGPT weekly active users have surpassed 800 million

Exceeding RAG DRAG Technology to Significantly Improve the Accuracy of Large Models

Meta Releases New Model CWM to Aid Code Understanding and Generation

Alibaba Cloud Launches New Security Guard Qwen3Guard, Aimed at Providing Reliable Security for Artificial Intelligence

AI Recruitment Unicorn Juicebox Raises $36 Million in Funding: A 4-Person Team Creates a Ten-Million ARR Miracle, Sequoia Leads the A Round

GEO Services