Microsoft has recently made significant breakthroughs in the AI field, open-sourcing an AI agent reasoning model called rStar2-Agent. This model uses an innovative agent reinforcement learning approach. Surprisingly, despite having only 14 billion parameters, it achieved an accuracy of 80.6% on the AIME24 math reasoning test, surpassing DeepSeek-R1 with 671 billion parameters (79.8%). This performance has led people to reevaluate the relationship between model parameter size and performance.
In addition to its excellent performance in math reasoning tasks, rStar2-Agent also shows remarkable results in other fields. On the GPQA-Diamond science reasoning benchmark, the model achieved an accuracy of 60.9%, surpassing DeepSeek-V3's 59.1%; on the BFCL v3 agent tool usage task, its task completion rate reached 60.8%, also higher than DeepSeek-V3's 57.6%. These data indicate that rStar2-Agent demonstrates strong generalization capabilities across various tasks.
To achieve this breakthrough, Microsoft made three innovations in training infrastructure, algorithms, and training processes. First, in terms of infrastructure, Microsoft built an efficient isolated code execution service that can quickly process a large number of training requests, supporting up to 45,000 concurrent tool calls per training step, with an average latency of only 0.3 seconds. Second, Microsoft proposed a new GRPO-RoC algorithm, which improves the model's accuracy and efficiency during reasoning through effective reward mechanisms and algorithm optimization. Finally, rStar2-Agent designed an efficient training process called "non-reasoning fine-tuning + multi-stage reinforcement learning" to ensure steady improvement of the model at each stage.
This series of technological breakthroughs has enabled rStar2-Agent to stand out in the AI agent field and has opened up new directions for future research and applications of intelligent agents.
Open source address: https://github.com/microsoft/rStar
Key points:
🌟 The rStar2-Agent model has only 14 billion parameters but achieved an accuracy of 80.6% in math reasoning tests, surpassing DeepSeek-R1 with 671 billion parameters.
🔧 Microsoft made innovations in infrastructure, algorithms, and training processes to ensure efficient training and outstanding performance of the model.
📊 rStar2-Agent performs well in science reasoning and tool usage tasks, demonstrating strong generalization capabilities.