Aider Leaderboard Publishes Test Results, Kimi K2 Programming Ability Is Comparable to Qwen3-235B-A22B

AIbase基地

Published inAI News · 7 min read · Jul 18, 2025

21

Recently, the Aider Leaderboard released its latest test results, highlighting that Kimi K2, an open-source model developed by Moonshot AI, performed exceptionally well in programming tasks. Its programming capabilities are comparable to Qwen3-235B-A22B and are close to those of o3-mini-high and Claude-3.7-Sonnet. With its low cost and high performance, Kimi K2 is considered an ideal choice for terminal coding agents, sparking intense discussions within the developer community.

Aider Leaderboard Reveals: Kimi K2 Shines in Programming Capabilities

The Aider Leaderboard is an authoritative benchmark for evaluating the code editing capabilities of large language models (LLMs), covering multilingual programming tasks and complex code editing scenarios. In the latest tests, Kimi K2 achieved results comparable to Qwen3-235B-A22B, ranking among the top open-source models. Its performance is slightly behind o3-mini-high and Claude-3.7-Sonnet but offers a significant advantage in reasoning costs, showcasing the unique competitiveness of open-source models in terms of cost-effectiveness.

Kimi K2 uses a mixture-of-experts (MoE) architecture with a total parameter count of 1 trillion, and it activates 32 billion parameters per inference. It supports a context length of 128k. This efficient design enables it to perform exceptionally well in handling complex programming tasks, especially in scenarios requiring precise code replacement and multi-step tasks.

Low Cost, High Performance: The Ideal Choice for Terminal Coding

Kimi K2's inference cost is significantly lower than that of proprietary models like Claude-4-Sonnet, at only $0.14 per million input tokens and $2.49 per million output tokens, roughly one-third of Claude-4-Sonnet’s cost. This low-cost feature makes it the preferred choice for developers building terminal coding agents. Combined with the Claude Code environment, Kimi K2 can efficiently perform code editing, file operations, and shell commands, serving as the "intelligent brain of the Linux terminal."

Kimi AI 、月之暗面

In actual testing, Kimi K2 achieved a single-attempt accuracy rate of 65.8% on the SWE-bench Verified test, surpassing GPT-4.1 (54.6%) and trailing only behind Claude-4-Sonnet. On benchmarks such as LiveCodeBench and EvalPlus, Kimi K2 scored 53.7% and 80.3%, respectively, leading among open-source models. These results demonstrate that Kimi K2 has reached industry-leading levels in code generation and tool invocation.

Diverse Application Scenarios: From Web Generation to Complex Agent Tasks

Kimi K2 not only excels in programming tasks but also demonstrates strong potential across multiple application scenarios. Developer feedback indicates that Kimi K2 performs particularly well in web generation, even surpassing Claude-4-Sonnet in certain tasks. Its agent capabilities support continuous tool calls and autonomous task execution, making it suitable for automated workflows, code debugging, and multi-step task processing. For example, in a video-to-text workflow, Kimi K2 can fully execute Python scripts, while other models like GPT-4.1 may fail due to missing steps.

Additionally, Kimi K2 supports inference frameworks such as vLLM and Hugging Face. Developers can deploy it via Moonshot AI’s API (https://platform.moonshot.ai) or Hugging Face model weights, greatly lowering the entry barrier. Its open-source nature (under the MIT license) and compatibility with various inference engines further promote widespread community adoption.

Landmark in Open-Source AI

AIbase believes that Kimi K2’s outstanding performance marks an important step forward for open-source AI models in the field of programming. Its high performance, low cost, and strong agent capabilities not only challenge the dominance of proprietary models but also provide opportunities for small and medium-sized development teams to build intelligent coding tools. Kimi K2’s release further confirms the leadership of Chinese AI companies in the global open-source ecosystem, and it is expected to drive innovation in more fields in the future.

Currently, Kimi K2 is available through the Moonshot AI platform and tools like Cline. Developers can test it in the Claude Code environment. The official also provides detailed deployment guides, supporting inference engines such as vLLM and SGLang, making it easy for developers to get started quickly.

Future Outlook: A New Chapter in Agent Intelligence

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Aider Leaderboard Publishes Test Results, Kimi K2 Programming Ability Is Comparable to Qwen3-235B-A22B

AIbase基地

This article is from AIbase Daily