Musk Likes! Kimi Paper Shakes the Traditional Foundations of Large Models: Same Computing Power, Efficiency Improved by 25%

AIbase基地

Published inAI News · 4 min read · Mar 19, 2026

Same computing power and data, why do some models perform better? Moonshot AI provides a fundamental answer.

On March 16, Kimi released a major technical report “Attention Residuals” (Attention Residuals). This research thoroughly restructures the "foundation" of large models since 2015 — residual connections (Residual Connections). Experiments show that, with the same computing power, the new method achieves the same performance as the baseline model using 1.25 times the computing power.

This breakthrough quickly caused a stir in the Silicon Valley AI community, with social media openly praising it as “Impressive work from Kimi.”

Jerry Tworek (main inventor of OpenAI o1): Called it the beginning of “Deep Learning 2.0.”

Andrej Karpathy (co-founder of former OpenAI): Expressed that the industry still has room to explore the understanding of “Attention is All You Need.”

Why modify the “time-honored foundation”?

Although traditional residual connections solve the problem of training deep networks, their “equal addition” approach is too crude. As the network deepens, the new contribution of each layer tends to be overwhelmed by accumulated information, leading many intermediate layers to become “ineffective workers.”

Kimi's “Elegant Rotation”:

The team found that the loss of information in the depth direction is highly consistent with the forgetting in the time dimension of RNNs. They then rotated the attention mechanism, originally used for processing text sequences, 90 degrees horizontally and applied it to the vertical depth dimension.

Through this, each layer no longer passively receives accumulated information but actively and selectively decides how much information to extract from previous layers through a small “query vector.” To address memory overhead in large-scale training, the team also innovatively proposed the Block AttnRes solution, dividing the network into several blocks. This ensures performance while keeping the inference delay increase within 2%.

In experiments, this architecture demonstrated strong generalization ability. It achieved a 7.5% improvement on the GPQA-Diamond science reasoning task, and significant gains of 3.6% and 3.1% in math and code generation tasks, respectively.

As the founder stated in his speech at GTC2026, the industry is gradually encountering the limits of scaling and must restructure foundational elements such as optimizers and residual connections. While most people are still focusing on “high-level renovation,” chose to go to the deepest level, striking a heavy blow to the future of deep learning with one decisive move.

Pentagon Establishes Working Group to Accelerate the Use of AI Tools in Sensitive Networks

The Pentagon's cyber operations unit is forming a specialized task force to accelerate the deployment of advanced AI tools in sensitive networks, addressing security risks from rapidly emerging private-sector AI models that can identify digital system vulnerabilities faster than top hackers. Two weeks ago, General Joshua Rader, leader of the NSA and Cyber Command, announced via internal email that the task force aims to enhance cybersecurity defe....

Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

Tencent's Hunyuan large model, in collaboration with the Palace Museum and other institutions, launched 'Chronicles-OCR,' the industry's first ancient script perception benchmark covering the evolution trajectory of the 'seven script styles' of Chinese characters. The dataset, cross-annotated by experts with 2,800 images, tests AI's ability to recognize ancient scripts like oracle bone inscriptions, advancing AI's understanding of Chinese charact....

32 Startups Compete for 11% Market! The Information Data Exposes: The 'False Prosperity' of Generative AI Is Accelerating the Winner-Takes-All Scenario

The generative AI sector appears thriving but is undergoing a brutal reshuffle. According to The Information, 32 of 34 hot startups compete for just 11% of the market, with nearly 90% dominated by top players. It's not a boom but a winner-takes-all scenario, driven by flattening tech barriers, solidified user habits, and capital preferences.....

AI Daily: WeChat Mini Program Officially Integrates Hy3 Preview; QQ Browser Launches Gaokao AI Skill; Moonlight Releases Kimi WebBridge

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. WeChat announced that the Mini Program Growth Plan has officially integrated the Hy3preview model, enhancing the AI capabilities to improve development.

Lovable Leads Atech's Seed Round Financing AI Ambient Coding Officially Enters the Hardware Field

Lovable, an AI application development platform, led a $800,000 seed round for Danish hardware startup Atech on May 14, 2026, with participation from a16z Scout Fund, Sequoia Scout Fund, and Nordic Makers. This investment marks Lovable's expansion of its 'vibe coding' concept from software to hardware engineering, as Atech aims to simplify hardware prototyping through an AI interaction platform.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Musk Likes! Kimi Paper Shakes the Traditional Foundations of Large Models: Same Computing Power, Efficiency Improved by 25%

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Pentagon Establishes Working Group to Accelerate the Use of AI Tools in Sensitive Networks

Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

State-owned capital enters! The new $2 billion financing of Moonshot is nearing completion

32 Startups Compete for 11% Market! The Information Data Exposes: The 'False Prosperity' of Generative AI Is Accelerating the Winner-Takes-All Scenario

U.S. Jury Rules Against Musk in Musk v. OpenAI Case; Musk Insists on Appealing Further

Anthropic Invests $30 Million to Acquire Stainless, Aiming to Vertically Integrate AI Infrastructure and Cut Off Competitors' Toolchain

AI Daily: WeChat Mini Program Officially Integrates Hy3 Preview; QQ Browser Launches Gaokao AI Skill; Moonlight Releases Kimi WebBridge

Moonshot AI Launches Kimi WebBridge, Letting AI Easily Operate the Browser for You!

Lovable Leads Atech's Seed Round Financing AI Ambient Coding Officially Enters the Hardware Field

Mobile ChatGPT Achieves Remote Control of Codex to Manage Mac Tasks Anytime, Anywhere

AI News Recommendations

Pentagon Establishes Working Group to Accelerate the Use of AI Tools in Sensitive Networks

Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

State-owned capital enters! The new $2 billion financing of Moonshot is nearing completion

32 Startups Compete for 11% Market! The Information Data Exposes: The 'False Prosperity' of Generative AI Is Accelerating the Winner-Takes-All Scenario

U.S. Jury Rules Against Musk in Musk v. OpenAI Case; Musk Insists on Appealing Further

Anthropic Invests $30 Million to Acquire Stainless, Aiming to Vertically Integrate AI Infrastructure and Cut Off Competitors' Toolchain

AI Daily: WeChat Mini Program Officially Integrates Hy3 Preview; QQ Browser Launches Gaokao AI Skill; Moonlight Releases Kimi WebBridge

Moonshot AI Launches Kimi WebBridge, Letting AI Easily Operate the Browser for You!

Lovable Leads Atech's Seed Round Financing AI Ambient Coding Officially Enters the Hardware Field

Mobile ChatGPT Achieves Remote Control of Codex to Manage Mac Tasks Anytime, Anywhere