Scientists Reveal a New Way for Large Language Models to Understand the World, on Par with Humans!

AIbase基地

Published inAI News · 5 min read · Jun 11, 2025

20

According to a report by Science and Technology Daily, researchers from the Institute of Automation at the Chinese Academy of Sciences have recently achieved an important breakthrough. They have for the first time confirmed that multimodal large language models can spontaneously "understand" things during the training process, and their understanding method is very similar to human cognition. This discovery not only opens up a new path for us to explore the thinking mechanism of artificial intelligence but also lays the foundation for developing artificial intelligence systems that can understand the world like humans in the future. The research results have been published in the journal Nature Machine Intelligence.

Understanding is the core of human intelligence. When we see "dog" or "apple," besides recognizing their appearance features such as size, color, and shape, we also understand their functions, feelings they evoke, and cultural significance. This all-encompassing ability to understand is the basis of our perception of the world. With the rapid development of large models like ChatGPT, scientists have begun to consider whether these models can learn to "understand" things like humans through vast amounts of text and images.

Metaverse Science Fiction Cyberpunk Painting (1) Large Model

Image Source Note: Image generated by AI, authorized service provider Midjourney

Traditional artificial intelligence research has focused more on object recognition accuracy, with little discussion on whether the model truly "understands" the essence of objects. Researcher He Huiguang from the Chinese Academy of Sciences pointed out that although current artificial intelligence can distinguish between cat and dog pictures, the essential difference between this "recognition" and human understanding of cats and dogs still requires in-depth study.

In this study, the research team drew inspiration from the cognitive principles of the human brain and designed an interesting experiment: having large models play the "spot-the-difference" game with humans. They selected three items' concepts from 1854 common items and asked participants to identify the least fitting one. By analyzing 4.7 million judgment data points, researchers created the first "mind map" or "concept map" of large models.

The study revealed that scientists summarized 66 key perspectives representing artificial intelligence's "understanding" of things. These perspectives are not only easy to explain but also highly consistent with the neural activity patterns in the human brain responsible for processing objects. More importantly, multimodal models capable of handling both text and images are closer to humans in their "thinking" and selection processes.

Interestingly, when humans make judgments, they consider both the appearance features of objects and their meanings or functions, while large models rely more on the "text labels" and abstract concepts they have acquired. This finding indicates that large models have indeed developed a way of understanding the world similar to humans, opening a new chapter in artificial intelligence's understanding capabilities.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Scientists Reveal a New Way for Large Language Models to Understand the World, on Par with Humans!

AIbase基地

This article is from AIbase Daily