With the rapid development of artificial intelligence technology, major AI startups are constantly claiming that their products will change the way work is done and knowledge is acquired. However, a recent study published in the Royal Society has revealed serious problems with the text summarization capabilities of new-generation AI models, causing concern. The study shows that new AI chatbots have a 73% probability of omitting key information when providing information.
Image source note: Image generated by AI, image authorization service provider Midjourney
This study analyzed ten widely used language models (LLMs), covering nearly 5000 scientific research summaries, involving chatbots such as ChatGPT-4o, ChatGPT-4.5, DeepSeek, and LLaMA3.370B. The research results show that even under specific requirements, the error rate of AI-provided answers in omitting key details is five times higher than that of human-written scientific abstracts.
The researchers pointed out: "When summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to overgeneralization of the original research results." What is more worrying is that as chatbots continue to update, their error rates are rising instead of decreasing, which is completely contrary to the promises made by AI industry leaders. For example, between 2023 and 2025, the use of ChatGPT among American teenagers increased from 13% to 26%. In the study, the possibility of old version ChatGPT-4Turbo omitting key details is 2.6 times that of the original version, while the new version ChatGPT-4o reaches 9 times. Similarly, Meta's LLaMA3.370B has a probability 36.4 times higher than the previous version to overgeneralize.
Summarizing large amounts of data into a few concise sentences is a complex task. While humans can intuitively extract broad lessons from specific experiences, this is extremely complex for programming chatbots. Researchers point out that in fields like clinical medicine, details are critical, and even minor omissions could lead to serious consequences. Therefore, widely applying LLMs across various industries, especially in medical and engineering fields, poses significant risks.
Nevertheless, the study also mentioned that the prompts provided to LLMs significantly affect their answer results, but whether this affects their ability to summarize scientific papers remains unknown, which provides direction for future research. Overall, unless AI developers can effectively address these issues with new-generation LLMs, people may still need to rely on human-written content to accurately summarize scientific reports.
Key points:
🧠 Research found that new-generation AI chatbots have a key detail omission rate as high as 73% when summarizing information.
📈 Error rates of new versions of chatbots are increasing, especially against the backdrop of rapidly increasing usage among teenagers.
🔍 Prompts provided to LLMs affect their answers, but the effectiveness of their summaries of scientific papers still needs further research.