With the continuous advancement of AI technology, how to enable large models to have "parallel thinking" capabilities has become a hot topic among researchers. Recently, the Tencent AI Lab, in collaboration with research teams from multiple universities, introduced a new reinforcement learning (RL) framework called Parallel-R1, aimed at teaching large models to explore multiple reasoning paths simultaneously. This innovative framework opens up new approaches for tackling complex mathematical reasoning tasks.
Traditional methods often rely on supervised fine-tuning (SFT), which not only requires high-quality data but also causes models to merely imitate existing data, lacking autonomous learning and generalization abilities. To address these issues, the Parallel-R1 framework was developed. The key finding of the research team is that using simple prompts allows the model to generate high-quality parallel thinking data when solving simple math problems. Subsequently, through a "progressive curriculum" training mode, the model first learns the "syntax format" of parallel thinking from simple tasks and then gradually transitions to more complex math problems for reinforcement learning.
In addition, the team proposed an alternating reward strategy to address the issue of reward design, cleverly balancing "problem-solving accuracy" and "thinking diversity." During training, the model primarily receives "accuracy rewards," while in some cases, it also gets additional rewards for using parallel thinking. This strategy significantly improves the model's use of parallel thinking and leads to significant improvements in multiple mathematical benchmark tests.
Experimental results show that the Parallel-R1 framework not only increases the average accuracy by up to 8.4% on multiple math benchmarks but also achieves a performance jump of 42.9% on the AIME25 test. Researchers found that after training, the model's thinking strategies gradually shift from an initial "broad exploration" approach to a later "precise verification" method, fully demonstrating the advantages of parallel thinking.
The success of Parallel-R1 not only opens up a new direction for the reasoning capabilities of large models but also provides new insights for future AI research, highlighting the potential of parallel thinking in solving complex tasks.