Recently, the Silicon Flow Large Model Service Platform has officially launched the latest open-source Ling-mini-2.0 model from Ant Group's BaiLing team. This new model demonstrates extremely high generation speed while maintaining advanced performance, marking a breakthrough in achieving great power with a small size.

image.png

Ling-mini-2.0 adopts an MoE architecture, with a total of 16B parameters. However, during generation, only 1.4B parameters are activated per Token, significantly improving the generation speed. This design not only maintains excellent performance when processing tasks but also allows effective comparison with Dense language models with less than 10B parameters and other larger-scale MoE models. Its maximum context length support reaches 128K, greatly expanding the model's applicability.

image.png

In benchmark tests, Ling-mini-2.0 performed excellently in reasoning tasks across multiple fields. Whether it is coding, mathematics, or knowledge-intensive reasoning tasks, Ling-mini-2.0 achieved satisfactory results, demonstrating its strong comprehensive reasoning ability. Especially in difficult tasks, the model's performance outperformed many similar products, showing excellent performance.

In addition, Ling-mini-2.0 also has advantages in generation speed. For question-answering tasks within 2000 Tokens, its generation speed exceeds 300 Tokens per second, which is more than twice as fast as traditional 8B Dense models. As the output length increases, the model's speed can be further improved, reaching up to 7 times the relative acceleration.

To facilitate developers, the Silicon Flow platform also provides various access schemes and API documentation, supporting developers to compare and combine models on the platform, helping them easily implement generative AI applications. The platform also includes multiple large model APIs available for free, further promoting the popularization and application of AI technology.

Key Points:

🧠 Ling-mini-2.0 has a total of 16B parameters, activating only 1.4B parameters per Token, achieving efficient generation.  

🚀 The model supports a maximum context length of 128K, demonstrating strong reasoning capabilities.  

💻 The Silicon Flow platform provides various access schemes, enabling developers to easily use multiple large model APIs.