Global Chinese Language Model Competition! Overseas Strong Contenders Take the Top Three, Domestic Ones Show Promise!

AIbase基地

Published inAI News · 3 min read · Feb 4, 2026

Recently, SuperCLUE released the 2025 Annual Chinese Large Model Benchmark Report, drawing the attention of many tech enthusiasts. In this evaluation, a total of 23 domestic and international large models participated, covering six core dimensions including mathematical reasoning, scientific reasoning, and code generation. The evaluation results show that overseas closed-source models still hold a leading position, especially Anthropic's Claude-Opus-4.5-Reasoning, which scored 68.25 points and ranked first, becoming the standout performer in this evaluation.

Following closely behind are Google's Gemini-3-Pro-Preview and OpenAI's GPT-5.2 (high), scoring 65.59 and 64.32 points respectively, ranking second and third. The strength of these overseas giants remains impressive. However, it is worth noting that domestic large models also demonstrated remarkable performance in this evaluation, especially the open-source model Kimi-K2.5-Thinking and the closed-source model Qwen3-Max-Thinking, which secured fourth and sixth places with scores of 61.50 and 60.61 points respectively.

In specific areas, domestic models performed particularly outstandingly. Kimi-K2.5-Thinking achieved the top score of 53.33 points in the code generation task, while Qwen3-Max-Thinking tied with Gemini-3-Pro-Preview in the mathematical reasoning task, achieving an impressive score of 80.87 points to claim the top spot. These achievements indicate that domestic models are gradually moving from "following" to "running side by side," showing strong catching-up capabilities.

Overall, overseas closed-source models still lead over domestic models, but domestic open-source models performed well, holding an absolute advantage in the Top 5, demonstrating the strong power and development potential of domestic open-source models. With continuous technological progress and accelerated domestic research, the field of Chinese large models may soon bring more surprises and challenges.

SuperCLUE LargeModel Claude-Opus-4.5-Reasoning Kimi-K2.5-Thinking

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

New Technology to Arrive at the 2026 Spring Festival Gala! 8K and AIGC Visual Effects Bring Audience Experience to a New Level

The 2026 CCTV Spring Festival Gala will fully utilize 8K ultra-high-definition technology with domestically produced equipment to enhance visual quality and immersion. Additionally, the CCTV Ting Media Large Model 2.0 will debut, innovating program production and interactive experiences.....

Feb 4, 2026

2025 Global Chinese Large Model Ranking Released: Overseas Giants Take Top Three, Domestic Large Models Surpass in Niche Areas

SuperCLUE's 2025 Chinese LLM evaluation report, covering six dimensions including math reasoning and code generation, reveals that overseas closed-source models lead, with Anthropic's Claude-Opus-4.5-Reasoning topping the list at 68.25 points.....

Feb 4, 2026

14 Days to Break 1 Million Downloads! Zhipu GLM-4.7-Flash Leads the Open Source Large Model SOTA

Two weeks after the release of Zhipu AI's open-source model GLM-4.7-Flash, its download count on Hugging Face exceeded 1 million. This 30B-A3B hybrid thinking model delivers strong performance, outperforming gpt-oss-20b and Qwen3-30B-A3B-Thinking-2507 in tests such as SWE-bench Verified and τ²-Bench, leading among models of the same size.

Feb 4, 2026

Earthquake-Level Update in the Programming World! Details of Claude5 Core Leaked: Mid-Range Pricing but Can Outperform Flagship Models?

Claude Sonnet5 (Fennec), Anthropic's next-gen programming AI, was accidentally leaked via Google Cloud logs, revealing its near-completion and powerful capabilities. Expected to launch in early February 2026, it marks a new era in AI-assisted development.....

Feb 3, 2026

260

Claude 5 Revealed: Anthropic Unveils the Fennec Programming Model, Bringing a Major Shift in the Landscape

Claude Sonnet5, a new AI programming model by Anthropic, offers superior performance at half the cost, using an innovative 'swarm' approach to address current AI coding challenges and reshape the industry.....

Feb 3, 2026

340

Former OpenAI Expert Issues Warning: AI Won't Learn from Mistakes, AGI Faces Key Bottleneck

Although AI large models perform well on reasoning tasks, their core deficiency lies in their inability to learn from mistakes. OpenAI researcher Jerry Tworek pointed out that the lack of an effective correction mechanism after model failure has become a key obstacle in achieving artificial general intelligence.

Feb 3, 2026

100

Overseas Revenue Surpasses Domestic: Kimi K2.5 Drives Moonshot Global Breakthrough

Moon's Dark Side released the K2.5 model, marking a milestone in Kimi's global expansion. Overseas revenue surpassed domestic for the first time, achieving a major breakthrough in international commercialization of domestic large models. Post-update, global paid users surged fourfold within days, leading in popularity on platforms like Openroute.....

Feb 3, 2026

150

OpenAI Launches macOS Version of Codex App: Integrates Agent-Based Development Logic, Targeting Claude Code Users

OpenAI launches a desktop application of Codex compatible with macOS, deeply integrating agent-based software development logic, and providing a flexible and efficient AI-native programming environment. The application supports parallel operations of multiple agents, combining the specialized capabilities of different AI agents, and driving rapid transformation in the human-machine collaboration model in the field of programming development.

Feb 3, 2026

160

Kimi publicly called out the wrong person: The top four results from Baidu search are not official websites. After the response, the issue was quickly removed

The domestic AI company Moon Phase publicly questioned the misleading results of Baidu search, pointing out that none of the top four results for "Kimi" were official, and many websites were falsely using the name "Kimi official website," which could mislead users.

Feb 2, 2026

180

Tencent Yuanbao Launches 1 Billion Red Envelope to Intensify the Battle for AI Applications During the Spring Festival!

Tencent's Yuanbao App launched a 1 billion yuan Spring Festival red envelope campaign, topping Apple's free app chart. The red envelope competition highlights not just monetary rivalry but also deep AI application, crucial for capturing user traffic. Analyst Pei Yifan from AVIC Securities notes AI is intensifying industry competition.....

Feb 2, 2026

160

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Global Chinese Language Model Competition! Overseas Strong Contenders Take the Top Three, Domestic Ones Show Promise!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

New Technology to Arrive at the 2026 Spring Festival Gala! 8K and AIGC Visual Effects Bring Audience Experience to a New Level

2025 Global Chinese Large Model Ranking Released: Overseas Giants Take Top Three, Domestic Large Models Surpass in Niche Areas

14 Days to Break 1 Million Downloads! Zhipu GLM-4.7-Flash Leads the Open Source Large Model SOTA

Earthquake-Level Update in the Programming World! Details of Claude5 Core Leaked: Mid-Range Pricing but Can Outperform Flagship Models?

Claude 5 Revealed: Anthropic Unveils the Fennec Programming Model, Bringing a Major Shift in the Landscape

Former OpenAI Expert Issues Warning: AI Won't Learn from Mistakes, AGI Faces Key Bottleneck

Overseas Revenue Surpasses Domestic: Kimi K2.5 Drives Moonshot Global Breakthrough

OpenAI Launches macOS Version of Codex App: Integrates Agent-Based Development Logic, Targeting Claude Code Users

Kimi publicly called out the wrong person: The top four results from Baidu search are not official websites. After the response, the issue was quickly removed

Tencent Yuanbao Launches 1 Billion Red Envelope to Intensify the Battle for AI Applications During the Spring Festival!

AI News Recommendations

New Technology to Arrive at the 2026 Spring Festival Gala! 8K and AIGC Visual Effects Bring Audience Experience to a New Level

2025 Global Chinese Large Model Ranking Released: Overseas Giants Take Top Three, Domestic Large Models Surpass in Niche Areas

14 Days to Break 1 Million Downloads! Zhipu GLM-4.7-Flash Leads the Open Source Large Model SOTA

Earthquake-Level Update in the Programming World! Details of Claude5 Core Leaked: Mid-Range Pricing but Can Outperform Flagship Models?

Claude 5 Revealed: Anthropic Unveils the Fennec Programming Model, Bringing a Major Shift in the Landscape

Former OpenAI Expert Issues Warning: AI Won't Learn from Mistakes, AGI Faces Key Bottleneck

Overseas Revenue Surpasses Domestic: Kimi K2.5 Drives Moonshot Global Breakthrough

OpenAI Launches macOS Version of Codex App: Integrates Agent-Based Development Logic, Targeting Claude Code Users

Kimi publicly called out the wrong person: The top four results from Baidu search are not official websites. After the response, the issue was quickly removed

Tencent Yuanbao Launches 1 Billion Red Envelope to Intensify the Battle for AI Applications During the Spring Festival!

GEO Services