Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

AI Deployment Calculator

Enter Your Large Model Computing Requirements for Instant GPU, Memory & Server Configuration Recommendations

AI Tutorial

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

Moonshot Introduces a New Hybrid Linear Attention Architecture Kimi Linear

AIbase基地

Published inAI News · 3 min read · Oct 31, 2025

Recently, Moonshot AI released a new hybrid linear attention architecture called "Kimi Linear." This architecture is claimed to outperform traditional full-attention methods in handling short-range, long-range information, and various scenarios such as reinforcement learning (RL). Its core technology, Kimi Delta Attention (KDA), is an optimization of Gated DeltaNet, introducing a more efficient gating mechanism to better manage the memory usage of limited-state RNNs.

Kimi Linear is composed of three Kimi Delta Attention units and one global MLA. This structure compresses the memory of limited-state RNNs through fine-grained gating, making the model more efficient when processing information. According to official statements, in a 1M token data scenario, the KV cache usage of Kimi Linear is reduced by 75%, and the decoding throughput can be increased up to six times. TPOT is accelerated by 6.3 times compared to traditional MLA.

This new architecture provides stronger support for various AI application scenarios. Whether in information-intensive natural language processing tasks or reinforcement learning in dynamic environments, Kimi Linear shows significant advantages. As AI technology continues to develop, this efficient attention mechanism may bring new breakthroughs for future intelligent applications.

More technical details can be found in the technical report of Kimi Linear at https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf.

Key Points:
🌟 Kimi Linear is a new hybrid linear attention architecture that optimizes information processing performance.
🚀 In a 1M token scenario, the KV cache usage is reduced by 75%, and the decoding throughput is increased by six times.
🔍 Kimi Delta Attention is its core technology, which optimizes RNN memory management through fine-grained gating.

KimiLinear KimiDeltaAttention FullAttentionMethod KVcache

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Moonshot Launches Kimi Linear Architecture: KV Cache Reduced by 75%, Inference Speed Increased by 6 Times, Attention Mechanism Sees Groundbreaking Innovation!

Kimi Linear's hybrid linear attention architecture surpasses traditional methods in short/long-range processing and reinforcement learning, featuring Kimi Delta Attention for enhanced RNN memory efficiency and multi-scenario performance.....

Oct 31, 2025

140

Moonshot AI Launches Kimi Linear: 6 Times Faster Linear Attention Architecture, Open-Source KDA Kernel Released Simultaneously

The domestic team Moonshot AI released the technical report on the Kimi Linear architecture, proposing a hybrid linear architecture that can replace the full attention mechanism. This architecture achieves breakthroughs in speed, memory efficiency, and long context processing, significantly reducing the use of KV cache, combining efficiency with performance advantages, and is called the new starting point for attention mechanisms in the era of intelligent agents.

Oct 31, 2025

190

Kimi and Tsinghua University Launch Open Source Model Inference Architecture Mooncake

Kimi Technology Co., Ltd. and Tsinghua University's MADSys laboratory have jointly released an open-source project called Mooncake, aimed at collaboratively building a large model inference architecture centered around KVCache. In June 2024, both parties released the design plan for the Mooncake inference system based on Kimi, which significantly enhances inference throughput through PD separation and a storage-computation architecture, garnering widespread attention in the industry.

Nov 28, 2024

2.7k

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Moonshot Introduces a New Hybrid Linear Attention Architecture Kimi Linear

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Moonshot Launches Kimi Linear Architecture: KV Cache Reduced by 75%, Inference Speed Increased by 6 Times, Attention Mechanism Sees Groundbreaking Innovation!

Moonshot AI Launches Kimi Linear: 6 Times Faster Linear Attention Architecture, Open-Source KDA Kernel Released Simultaneously

Kimi and Tsinghua University Launch Open Source Model Inference Architecture Mooncake

GEO Services