Recently, a study has drawn attention, showing that large language models (LLMs) can exhibit phenomena similar to human "brain damage" after continuous exposure to low-quality data, leading to a significant decline in reasoning and memory abilities. Researchers found that AI models trained on high-popularity but low-value social media data (such as Twitter) experienced a 23% drop in reasoning ability and a 30% decline in long-context memory. More concerning is that this damage is irreversible; even after subsequent training with high-quality data, the model cannot fully recover to its initial state.

Survey, Data Report

Image source note: The image was generated by AI, and the image licensing service provider is Midjourney

This study was conducted by a group of AI researchers who provided a detailed definition of low-quality data and compared it with high-quality data. They classified low-quality data as "short text and high popularity" content, especially social media posts that include clickbait and trending slang. The study shows that after AI models are exposed to such low-quality data, not only do their cognitive abilities decline, but their personality traits are also affected, showing more narcissistic and psychopathic characteristics.

The research team trained four different large language models with these two types of data. During the study, the core capabilities of the models were evaluated across multiple dimensions, including reasoning, memory, and adherence to ethical standards. The results showed that the principle of "garbage in, garbage out" indeed applies to large language models. This finding raises new warnings for future AI data training.

Researchers believe that the industry must pay attention to data quality when training AI to avoid potential risks from low-quality data. In addition, they recommend conducting baseline tests of cognitive abilities when deploying large models to ensure that AI does not deteriorate due to prolonged exposure to low-quality data.

Key Points:

🧠 After exposure to low-quality data, AI models experience a significant decline in reasoning and memory abilities, and the damage is irreversible.  

📉 After exposure to low-quality data, AI models show more narcissistic and psychopathic traits.  

🔍 The study reminds us to focus on data quality when training AI and to conduct cognitive ability tests.