Recently, a major announcement came from the artificial intelligence community: Anthropic officially released its Claude4 series models, including Claude Opus4 and Claude Sonnet4. This release didn't come with flashy slogans or lengthy papers; the only keyword was "getting things done." According to Anthropic, Claude Opus4 is hailed as the world's strongest programming model, capable of stably handling complex and long-term tasks with excellent performance. Claude Sonnet4, on the other hand, has been enhanced in both programming and reasoning capabilities, allowing it to respond more precisely to user instructions.
The Claude4 series brings several exciting new features. First, the model can use auxiliary tools during deep thinking to optimize the reasoning process and improve response quality. Second, both models can parallelize the use of these tools and, with developer authorization, enhance memory capacity to retain key information and maintain contextual coherence. Additionally, the release of Claude Code makes this series of models even more practical on platforms like GitHub Actions, VS Code, and JetBrains.
In the programming benchmark SWE-bench, Opus4 scored 72.5%, ranking among the top performers, while in Terminal-bench, it led competitors with 43.2%. This demonstrated its outstanding programming capabilities. Opus4 can break down problems like experienced programmers, accurately debug, and execute complex tasks, even excelling in Replit tests by successfully handling multi-file and large-scale changes in projects.
Compared to Opus4, Sonnet4 may not be the absolute strongest, but it could be more appealing to most developers. Compared to its predecessor, its programming ability, logical reasoning, and controllable responses have significantly improved, nearly matching Opus4 with a score of 72.7%. When handling complex instructions, Sonnet4 performs more clearly, and its code structure is more elegant, making it selected as the foundational model for the new generation of GitHub Copilot.
With the development of AI technology, Anthropic has also optimized the behavior and reasoning of the models. The Claude4 series can effectively execute complex reasoning tasks and significantly reduce the occurrence of logical flaws in testing. Meanwhile, the newly introduced "thought summary" function automatically compresses and summarizes information when the model's thinking path becomes too long, making the final information more concise and clear.
With the official launch of Claude Code, developers will be able to more easily integrate this powerful AI assistant into their workflows. Whether in command-line terminals or common IDEs, Claude Code can embed itself into real-world development scenarios, providing code modification suggestions to make the development process more efficient.