Apple Research Reveals the Truth Behind AI False Thinking: Reasoning Models Collapse Under Complex Tasks

AIbase基地

Published inAI News · 3 min read · Jun 9, 2025

This study tested four classic logic puzzles: the Tower of Hanoi, checkers, river crossing, and block world. These puzzles allow precise control over task complexity and are ideal scenarios for measuring the reasoning abilities of language models. The results showed that standard LLMs performed more accurately and efficiently in simple tasks, while the reasoning models, although slightly improving with increased complexity, ultimately collapsed under high complexity as well. What was even more surprising was that these models not only had their accuracy drop to zero when facing the most complex tasks but also used fewer reasoning tokens. In other words, their willingness and ability to "think" decreased instead.

The research team mapped the models' reasoning trajectories at different levels of complexity, revealing two typical failure modes: overthinking - in simple problems, models continue to generate incorrect alternatives even after finding the correct answer; and thinking collapse - in highly complex problems, the reasoning process abruptly halts, failing to even generate possible paths. Although reasoning models, with mechanisms like "chains of thought" and "self-reflection," are considered a step toward artificial general intelligence (AGI), Apple's research indicates that these mechanisms have fundamental flaws in scalability. Current reasoning models cannot formulate strategies that are universally applicable. Their "thinking" is more statistical generation rather than true logical deduction.

Apple Large Reasoning Model Claude3.7Thinking Deepseek-R1

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

JD.com officially announced the rebranding of its large model brand to JoyAI

During the 2025 World AI Conference (WAIC), JD Group announced that it has rebranded its large model brand to JoyAI, and showcased AI solutions covering multiple scenarios such as retail, logistics, industry, and health, along with an 'AI Smart Team', promoting the deep application of AI technology from the lab to the industry. The upgraded JoyAI large model system supports full-scale models from 3B to 750B, and is equipped with multi-modal interaction capabilities including language, speech, image, video, and digital humans. Through technologies such as dynamic hierarchical distillation and cross-domain data governance, it enhances its reasoning capabilities.

Jul 27, 2025

Tencent HuanYuan 3D World Model Officially Released and Open-Sourced

At the Tencent Forum of the 2025 World Artificial Intelligence Conference held on July 27, 2025, Tencent officially released the first version of the HuanYuan 3D World Model and announced that the model will be fully open-sourced. This move marks the birth of the industry's first open-source world generation model that supports immersive roaming, interaction, and simulation, opening up new possibilities for fields such as game development, virtual reality (VR), and digital content creation.

Jul 27, 2025

Say Goodbye to Storyboarding! Shengshu Technology's Vidu Q1 Refines Video Production Process

Shengshu Technology launched the Vidu Q1 Refiner video feature at WAIC 2025, revolutionizing the video production process. This feature allows users to upload reference images along with text prompts to directly generate videos, simplifying the traditional production workflow. It supports up to seven subjects simultaneously while maintaining consistency. It uses the U-ViT architecture combined with diffusion model technology to address commercial subject consistency issues. Meanwhile, the company has collaborated with Tsinghua University to introduce the embodied intelligence model Vidar, enabling low-cost robot action conversion through a video large model. The company is still focusing on improving...

Jul 27, 2025

Claude Integration Design Platform Canva Helps Convert Text into Beautiful Visual Designs

Anthropic's AI service Claude has formed a deep collaboration with the design platform Canva, launching a text-to-visual design feature. After users upload written content, the system can automatically identify and generate visual works that match the brand style, offering multiple template choices. This feature significantly lowers the design barrier, helping individual bloggers and corporate marketers quickly create professional visual content and enhance their communication effectiveness. This collaboration marks a new breakthrough in AI within the creative field, and it is expected to bring more innovative design experiences in the future.

Jul 25, 2025

110

Memories AI Launches the World's First Artificial Intelligence Visual Memory Model and Secures $8 Million in Seed Funding

Jul 25, 2025

130

Intel Makes Large-Scale Adjustments to Manufacturing Plan! Factories Delayed, Projects Put on Hold - What's Next for the Future?

Intel cancels Germany/Poland chip projects, delays Ohio plant, consolidates Costa Rica ops to SE Asia. Plans 15% workforce cuts to streamline operations.....

Jul 25, 2025

iFlytek Spark X1 Upgraded Version Released: Translation, Reasoning, and Text Generation Capabilities Have Significantly Improved!

iFlytek's upgraded Spark X1 model enhances translation, reasoning, and text generation, achieving top-tier global performance with 20% better translation quality. It scores over 80 in EN-CN conference translation, reduces AI hallucinations, and boosts efficiency for 100+ firms by 50%. Available via web, app, and API.....

Jul 25, 2025

150

iFlytek Spark X1 Advanced Reasoning Large Model Upgraded Version Launches with Significant Enhancements in Multiple Dimensions

iFlytek announced the official launch of the upgraded version of its deep reasoning large model, iFlytek Spark X1, which is trained using fully domestically developed computing power. This upgrade represents a comprehensive advancement, maintaining an edge over the latest versions of top domestic and international large models such as OpenAI's o3 in overall performance. It has also made significant progress in areas such as hallucination management, multilingual capabilities, and voice simultaneous translation, providing users with a smarter, more reliable, and more efficient AI assistant. The upgraded version of iFlytek Spark X1 has significantly improved capabilities in translation, reasoning, text generation, and mathematics.

Jul 25, 2025

170

Kuaishou Opensources KAT-V1 Large Model: Significant Improvement in Autonomous Thinking Ability 40B Version Performance Close to 40B Performance Approaching R1-0528

Kuaishou opensources the KAT-V1 autonomous thinking large model, which includes two versions: 40B and 200B. The 40B version performance is close to DeepSeek-R1, and the 200B version outperforms several flagship models. The model innovatively adopts a mixed training paradigm of short and long-term thinking and the Step-SRPO reinforcement learning algorithm, which can automatically adjust the thinking mode based on the complexity of the question, solving the problem of overthinking. Based on Qwen2.5-32B, it achieves excellent performance in fields such as science and code through an heterogeneous distillation framework and pre-training with 10 million examples.

Jul 25, 2025

200

1 Day to Launch a Custom AI? Ant Treasure Box Intelligent Entity Enterprise Version Released with Payment and Marketing MCP

Ant Group launches 'Treasure Box' AI platform for service industry digitalization, offering 70+ plugins and rapid deployment.....

Jul 25, 2025

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Apple Research Reveals the Truth Behind AI False Thinking: Reasoning Models Collapse Under Complex Tasks

AIbase基地

This article is from AIbase Daily

AI News Recommendations

JD.com officially announced the rebranding of its large model brand to JoyAI

Tencent HuanYuan 3D World Model Officially Released and Open-Sourced

Say Goodbye to Storyboarding! Shengshu Technology's Vidu Q1 Refines Video Production Process

Claude Integration Design Platform Canva Helps Convert Text into Beautiful Visual Designs

Memories AI Launches the World's First Artificial Intelligence Visual Memory Model and Secures $8 Million in Seed Funding

Intel Makes Large-Scale Adjustments to Manufacturing Plan! Factories Delayed, Projects Put on Hold - What's Next for the Future?

iFlytek Spark X1 Upgraded Version Released: Translation, Reasoning, and Text Generation Capabilities Have Significantly Improved!

iFlytek Spark X1 Advanced Reasoning Large Model Upgraded Version Launches with Significant Enhancements in Multiple Dimensions

Kuaishou Opensources KAT-V1 Large Model: Significant Improvement in Autonomous Thinking Ability 40B Version Performance Close to 40B Performance Approaching R1-0528

1 Day to Launch a Custom AI? Ant Treasure Box Intelligent Entity Enterprise Version Released with Payment and Marketing MCP