Recently, an evaluation conducted by the AI lab Andon Labs has attracted widespread attention. The study shows that cleaning robots equipped with top large models have a success rate of only 40% when performing simple household tasks, far lower than that of humans. The experiment required the robot to execute a multi-step instruction "pass the butter to the person," involving complex steps such as cross-room positioning, identifying packaging, locating a moving human, completing the delivery, and returning to charge.

Cleaning Robot Product Image

Image source note: The image is AI-generated

The evaluation results show that the best-performing robot, Gemini2.5Pro, had a success rate of 40%, while Claude Opus4.1 and GPT-5 had success rates of 37% and 30%, respectively. These data indicate that these high-end AI robots still have significant shortcomings in spatial reasoning, environmental understanding, and long-term task planning.

The research team emphasized that these robots not only perform poorly in home environments but may also pose safety risks. For example, some robots might be tricked into leaking confidential information or falling down due to an inability to recognize the risk of stairs. These phenomena reveal security vulnerabilities when current large language models (LLMs) are combined with machines, reminding people that while substantial capital is being invested in robot technology, attention must also be paid to their potential engineering and safety issues.

There is still a significant gap between powerful text generation capabilities and the execution of physical world tasks. For AI robots to truly enter family life, many challenges remain to be overcome, especially in terms of stability and safety.

Key Points:  

🧑‍🔬 The study found that the success rate of cleaning robots equipped with large models in executing multi-task instructions is only 40%.  

🚨 Robots perform poorly in spatial reasoning and environmental understanding, showing clear weaknesses.  

🔒 Robots may leak confidential information or fail to recognize environmental risks, posing safety hazards.