The rStar2-Agent model, open-sourced by Microsoft Research, has attracted attention in the field of AI mathematical reasoning. This 14-billion-parameter model surpasses the DeepSeek-R1 model, which has 671 billion parameters, in multiple mathematical benchmark tests through innovative agent reinforcement learning technology.

The core innovation of rStar2-Agent lies in abandoning the traditional chain-of-thought method and adopting an agent interaction mechanism. The model can autonomously plan the reasoning process, use Python code execution tools for verification, and adjust reasoning steps based on feedback, avoiding the common problem of error accumulation in traditional CoT methods.

In authoritative benchmark tests such as the American Invitational Mathematics Examination, rStar2-Agent performed outstandingly. On the AIME24 dataset, its pass@1 accuracy rate reached 80.6%, surpassing DeepSeek-R1's 79.8%, o3-mini's 79.6%, and Claude Opus4.0's 77.0%. It achieved an accuracy rate of 69.8% on AIME25 and 52.7% on HMMT25.

image.png

Notably, the response length of rStar2-Agent is significantly shorter. On the AIME24 test, it averages about 9,340 tokens, and about 10,943 tokens on AIME25, roughly half that of DeepSeek-R1, demonstrating higher reasoning efficiency.

In terms of training efficiency, the model completes 510 reinforcement learning steps in just one week, and can be trained with 64 MI300X GPUs. Its reinforcement learning infrastructure supports up to 45,000 concurrent tool calls per step, with an average latency of only 0.3 seconds.

The model introduces the GRPO-RoC algorithm to handle environmental noise during code execution. Through a "resampling when correct" strategy, it retains high-quality reasoning trajectories, improving training effectiveness.

In terms of generalization ability, rStar2-Agent outperforms DeepSeek-V3 on the GPQA-Diamond scientific reasoning benchmark. It also performs well in tasks involving BFCL v3 tools and general tests such as IFEval and Arena-Hard, showing the positive impact of agent reinforcement learning on general capabilities.

Microsoft has open-sourced the code and training methods of rStar2-Agent, implementing multi-stage reinforcement learning training based on the VERL framework. This breakthrough indicates that, through intelligent training strategies, small models can match the performance of large models on specific tasks, providing new possibilities for researchers and developers with limited resources.