Kimi Linear is an efficient hybrid linear attention architecture that outperforms traditional full attention methods in short context, long context, and reinforcement learning scenarios. It optimizes attention computation through the Kimi Delta Attention (KDA) mechanism, significantly improving performance and hardware efficiency, and is particularly good at handling long context tasks up to 1 million tokens.
Natural Language Processing
Transformers