32B Inference Performance Surpasses o1-mini! Alibaba Tongyi Launches FIPO Algorithm to Make Large Models Think Deeper
Alibaba's Tongyi Lab introduces the FIPO algorithm, which overcomes traditional reinforcement learning bottlenecks in complex logical reasoning. Using the Future-KL mechanism, it accurately identifies key reasoning steps, effectively addressing model stagnation in tasks like mathematics, thereby enhancing both accuracy and efficiency.....