Today, we officially launched Ring-mini-2.0, a high-performance inference MoE model that is deeply optimized based on the Ling-mini-2.0 architecture. Ring-mini-2.0 has a total parameter count of 16B, but in actual operation, it only needs to activate 1.4B parameters to achieve inference capabilities equivalent to dense models below 10B.

This model performs particularly well in logical reasoning, programming, and math tasks, supporting a long context of 128K, making it capable of demonstrating strong performance in various application scenarios. In addition, the generation speed of Ring-mini-2.0 is remarkable, achieving fast generation of over 300 token/s, and after optimization, it can even exceed 500 token/s.

Large Models Metaverse (1)

Image source note: The image is AI-generated, and the image licensing service provider is Midjourney.

In terms of improving inference capabilities, Ring-mini-2.0 has undergone deeper training based on Ling-mini-2.0-base, significantly enhancing the model's stability and generalization ability in complex reasoning tasks through joint optimization of Long-COT SFT, large-scale RLVR, and RLHF. We found that its performance significantly surpasses dense models below 10B in multiple high-difficulty benchmark tests, and it can even rival some larger MoE models, especially excelling in logical reasoning.

In addition, Ring-mini-2.0 emphasizes efficiency in design. Through a 1/32 expert activation ratio and MTP layer architecture optimization, it achieves the equivalent performance of about 7-8B dense models. This high sparsity and small activation design allows it to achieve a reasoning speed of over 300 token/s in H20 environments, and combined with Expert Dual Streaming optimization, it further reduces the cost of reasoning.

To promote research and applications in academia and industry, the model weights, training strategies, and data recipes of Ring-mini-2.0 will be fully open-sourced. We expect this "small but excellent" model to become the preferred choice for small inference models, and we welcome everyone to visit our open-source repository to download and use it. In the future, supported by the Ling2.0 architecture, we will continue to launch larger, faster, and stronger language models and multimodal models. Stay tuned!