The Ant Group's Bailing Large Model team recently announced the open-source release of two new efficient reasoning models: Ring-flash-linear-2.0 and Ring-mini-linear-2.0. These models are specifically designed to improve the efficiency of deep reasoning. Along with the models, two high-performance fusion operators developed in-house were also released: the FP8 fusion operator and the linear Attention inference fusion operator. These aim to achieve efficient reasoning with "large parameters and low activation" and support for ultra-long context.

According to the team, thanks to architectural optimization and the collaborative work of high-performance operators, the cost of these two new models in deep reasoning scenarios is only one-tenth of that of dense models of the same scale. Compared to the previous Ring series, the inference cost has also been reduced by more than 50%. This means that users can significantly reduce computational resource consumption when performing complex reasoning, thereby improving work efficiency.

The advantages of the new models are not only reflected in the reduction of costs, but also in the high alignment between the training and inference engine operators. This alignment allows the model to perform long-term, stable, and efficient optimization during the reinforcement learning phase, enabling these models to consistently maintain the best performance (SOTA) on multiple challenging reasoning benchmarks. This undoubtedly provides users with more powerful tools for complex reasoning tasks.

As an open-source project, Ring-flash-linear-2.0 and Ring-mini-linear-2.0 have been released on multiple platforms, including Hugging Face and ModelScope. Developers can obtain more information and try them on these platforms.

With this open-source release, the Ant Group's Bailing Large Model team not only demonstrates its technical strength in the AI field, but also provides developers with more efficient tools, helping them achieve greater breakthroughs in future AI development and research.