Too much! Anthropic releases Claude 4, programming and reasoning capabilities outperform Gemini 2.5 Pro

AIbase基地

Published inAI News · 5 min read · May 23, 2025

33

Recently, a major announcement came from the artificial intelligence community: Anthropic officially released its Claude4 series models, including Claude Opus4 and Claude Sonnet4. This release didn't come with flashy slogans or lengthy papers; the only keyword was "getting things done." According to Anthropic, Claude Opus4 is hailed as the world's strongest programming model, capable of stably handling complex and long-term tasks with excellent performance. Claude Sonnet4, on the other hand, has been enhanced in both programming and reasoning capabilities, allowing it to respond more precisely to user instructions.

The Claude4 series brings several exciting new features. First, the model can use auxiliary tools during deep thinking to optimize the reasoning process and improve response quality. Second, both models can parallelize the use of these tools and, with developer authorization, enhance memory capacity to retain key information and maintain contextual coherence. Additionally, the release of Claude Code makes this series of models even more practical on platforms like GitHub Actions, VS Code, and JetBrains.

In the programming benchmark SWE-bench, Opus4 scored 72.5%, ranking among the top performers, while in Terminal-bench, it led competitors with 43.2%. This demonstrated its outstanding programming capabilities. Opus4 can break down problems like experienced programmers, accurately debug, and execute complex tasks, even excelling in Replit tests by successfully handling multi-file and large-scale changes in projects.

Compared to Opus4, Sonnet4 may not be the absolute strongest, but it could be more appealing to most developers. Compared to its predecessor, its programming ability, logical reasoning, and controllable responses have significantly improved, nearly matching Opus4 with a score of 72.7%. When handling complex instructions, Sonnet4 performs more clearly, and its code structure is more elegant, making it selected as the foundational model for the new generation of GitHub Copilot.

With the development of AI technology, Anthropic has also optimized the behavior and reasoning of the models. The Claude4 series can effectively execute complex reasoning tasks and significantly reduce the occurrence of logical flaws in testing. Meanwhile, the newly introduced "thought summary" function automatically compresses and summarizes information when the model's thinking path becomes too long, making the final information more concise and clear.

With the official launch of Claude Code, developers will be able to more easily integrate this powerful AI assistant into their workflows. Whether in command-line terminals or common IDEs, Claude Code can embed itself into real-world development scenarios, providing code modification suggestions to make the development process more efficient.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Too much! Anthropic releases Claude 4, programming and reasoning capabilities outperform Gemini 2.5 Pro

AIbase基地

This article is from AIbase Daily