At 12:00 AM on October 14, Ant Group officially launched the trillion-parameter reasoning model Ring-1T and fully open-sourced the model weights and training recipes. Building upon the preview version Ring-1T-preview released on September 30, Ring-1T continues to expand large-scale verifiable reward-based reinforcement learning (RLVR) training, further enhancing the natural language reasoning capabilities of the trillion-parameter base model. Through RLHF training, the model's general capabilities have been improved, achieving more balanced performance across various task benchmarks.
To continuously enhance the mathematical and complex reasoning capabilities of Ring-1T, the BaiLing team challenged the more difficult IMO2025 (International Mathematical Olympiad) problems, integrating Ring-1T into the multi-agent framework AWorld, and solving the problems using pure natural language reasoning. The experimental results showed that Ring-1T solved questions 1, 3, 4, and 5 in one attempt, equivalent to an IMO silver medal level, making it the first open-source system capable of winning an international math competition award. In its third attempt at the IMO, Ring-1T provided a near-perfect proof for question 2, and in the sixth problem where top-tier large models almost all failed, it converged to the answer "4048" (the correct answer is 2112), matching Gemini 2.5 Pro. As a reasoning model, Ring-1T also demonstrated excellent general capabilities. On the "Human Preference Alignment" test Arena-Hard V2, Ring-1T ranked first among open-source models with an 81.59% success rate, approaching the performance of GPT-5-Thinking(High) at 82.91%. In the medical Q&A evaluation HealthBench for rigorous fields, Ring-1T achieved the highest score in the open-source domain.
(Performance comparison of Ring-1T with representative industry reasoning models)
The biggest challenge in training a trillion-parameter reasoning model is the discrepancy between training and inference precision. This discrepancy arises due to implementation differences between the training and inference stages, leading to inconsistent accuracy and ultimately causing training failure. To address this industry-wide challenge, Ant has adopted its self-developed "Icepop" algorithm in the Ring-1T model. Icepop uses a masked bidirectional truncation technique to freeze the distribution difference between training and inference at a low level, ensuring stable long-sequence, long-period training. Additionally, for the reinforcement learning training of trillion-parameter models, Ant has developed a high-performance reinforcement learning system ASystem (which includes the already open-sourced high-performance reinforcement learning framework AReaL). It has made fine optimizations for memory management and weight exchange between training and inference for trillion-parameter models, achieving sub-second memory fragmentation recovery on a single machine and zero-redundancy weight exchange, making large-scale RL training stable and routine.
(Left: GRPO training-inference discrepancy increases exponentially with training; Icepop remains relatively stable. Right: Maximum training-inference discrepancy, GRPO shows a significant increase with training, while Icepop maintains a low level.)
Besides, the released Ring-1T model continues to use the 1T base model from the Ling 2.0 architecture for post-training. Ling 2.0 adopts highly sparse MoE architecture, 1/32 expert activation ratio, FP8 mixed precision, MTP, and other features to achieve efficient training and inference. During the post-training phase, the BaiLing team significantly enhanced the model's complex reasoning capabilities as well as its instruction following and creative writing abilities through multi-stage training including LongCoT-SFT + RLVR + RLHF.
According to the BaiLing team, Ring-1T is their first attempt at a trillion-parameter reasoning model. The BaiLing team will continue to improve the model's performance in subsequent versions. Currently, users can download the model through HuggingFace and the Moda Community, and experience it online via platforms like the Ant Treasure Box.
It is understood that as of now, the Ant BaiLing large model has released 18 models, forming a product matrix of large language models ranging from 16 billion total parameters to 1 trillion total parameters. Among them are two trillion-parameter models—Ling-1T, a trillion-parameter general large language model, and Ring-1T, a trillion-parameter reasoning model. With the release of the two trillion-parameter models, the BaiLing large model has officially entered its 2.0 phase.