Tongyi Lab Launches FIPO Algorithm, 32B Model Inference Performance Surpasses o1-mini

AIbase基地

Published inAI News · 3 min read · Apr 8, 2026

The Tongyi Lab's Intelligent Computing team has officially released a new algorithm in the field of post-training for large models today—FIPO (Future-KL Influenced Policy Optimization). This algorithm introduces an innovative "Future-KL" mechanism, effectively addressing the common technical bottleneck of "inference length stagnation" in pure reinforcement learning (Pure RL) training processes.

In training for long-text reasoning and complex logical alignment, traditional reinforcement learning often struggles to accurately capture key decision points in long sequences. The FIPO algorithm developed by the Tongyi team implements differentiated reward allocation for key tokens, guiding the model to be more forward-looking during the chain-of-thought (CoT) generation process.

Experimental data shows that, under a pure reinforcement learning setting with a 32B-scale model, the model equipped with the FIPO algorithm has already surpassed similarly scaled models such as DeepSeek-Zero-MATH and OpenAI's o1-mini, marking substantial progress in the logical reasoning and mathematical computing capabilities of domestic large models.

Currently, the focus of the competition among large models is shifting from pre-training scale to deep alignment on the inference side. The release of the FIPO algorithm not only provides a new approach to evaluating the quality of "thinking processes" in logical reasoning models, but also indicates that the open-source community and leading domestic laboratories are gradually building an independent technological development path in their pursuit of global top-tier reasoning models.

FIPO AlibabaTongyiLab ReinforcementLearning LogicalReasoning

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

NVIDIA Open Sources Polar Framework: Enabling Zero-Barrier Evolution of AI Coding Agents Through Reinforcement Learning

NVIDIA open-sources Polar, a reinforcement learning training framework. Its core innovation allows mainstream code agents like Codex and Claude Code to integrate GRPO training without modifying native code. It addresses industry pain points in evolving agents from single-step tasks to complex long-flow tasks (e.g., repository-level modifications, OS interactions), breaking down barriers in agent reinforcement learning.....

May 28, 2026

860

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

Tencent Hunyuan, in collaboration with UCLA and CUHK, has open-sourced a multimodal search agent to address the evolution of Multimodal Large Language Models (MLLMs) from passive understanding to active reasoning. Previously, the lack of high-quality data, automated trajectory synthesis paths, and training recipes hindered the reproduction of top-tier agents. This open-source initiative aims to break the deadlock and advance community development....

May 7, 2026

690

32B Inference Performance Surpasses o1-mini! Alibaba Tongyi Launches FIPO Algorithm to Make Large Models Think Deeper

Alibaba's Tongyi Lab introduces the FIPO algorithm, which overcomes traditional reinforcement learning bottlenecks in complex logical reasoning. Using the Future-KL mechanism, it accurately identifies key reasoning steps, effectively addressing model stagnation in tasks like mathematics, thereby enhancing both accuracy and efficiency.....

Apr 8, 2026

650

AliTongyi Lab Launches FIPO Algorithm to Significantly Enhance Large Model Inference Capabilities

Alibaba's Qwen Pilot team introduces the FIPO algorithm, which uses a Future-KL mechanism to identify key tokens in reasoning chains, enhancing large model inference and overcoming limitations of traditional reinforcement learning methods.....

Apr 7, 2026

610

AI Daily: Alibaba Launches Wan2.7 Video Model; Red Fruit Removes AI Drama 'Peach Hairpin'; State Administration Strictly Prohibits AI Actors from Face-Swapping

Welcome to the 【AI Daily】 segment! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the latest developments in the AI field, focusing on developers to help you understand technological trends and innovative AI product applications. Discover new AI products by clicking here: https://app.aibase.com/zh1. Alibaba Tongyi Lab launches Wan2.7-Video video generation model. The Wan2.7-Video video generation model developed by Alibaba Tongyi Lab uses advanced technology to solve A

Apr 3, 2026

1.2k

World Models Enter the Era of Fine-Tuning: Tencent Opensources the Reinforcement Learning Post-Training Framework WorldCompass

Tencent's Hunyuan 3D team open-sourced WorldCompass, a reinforcement learning post-training framework designed to enhance world models' accuracy and user experience in interactions by addressing biases in handling complex instructions.....

Mar 11, 2026

490

OpenClaw Can Train While Running: AReaL v1.0 Stable Version of the Intelligence Agent Reinforcement Learning Training Framework Released

Ant Group and Tsinghua University released AReaL v1.0, an open-source RL training framework enabling 'one-click agent integration' without code modifications, enhancing efficiency and accessibility in reinforcement learning.....

Mar 4, 2026

1.4k

Former Chief Scientist of DeepMind Resigns to Start His Own Venture, Aiming for a New Era of Superintelligence!

Ex-DeepMind scientist Silva founds AI startup Ineffable Intelligence, seeking $1B seed funding, close to industry record. Background in AlphaGo draws tech sector attention.....

Feb 22, 2026

630

Tencent Hunyuan Welcomes a Top Scientist: Tianyu Peng Joins and Leads Multimodal Reinforcement Learning

Tencent strengthens AI talent by hiring Dr. Tianyu Pang, former senior research scientist at Sea AI Lab, as chief research scientist for its Hunyuan multimodal division, focusing on reinforcement learning to advance multimodal AI development.....

Feb 3, 2026

600

Tongyi Qianwen Secures Another Victory: Qwen3-VL Twins Open-Source, Bringing a New Paradigm to Multimodal Retrieval

Alibaba's Tongyi Lab open-sourced Qwen3-VL-Embedding and Qwen3-VL-Reranker models, built on the Qwen3-VL multimodal foundation, advancing multimodal search from keyword matching to semantic alignment for precise understanding and efficient retrieval of cross-modal information like images and videos.....

Jan 9, 2026

780

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Tongyi Lab Launches FIPO Algorithm, 32B Model Inference Performance Surpasses o1-mini

AIbase基地

This article is from AIbase Daily

AI News Recommendations

NVIDIA Open Sources Polar Framework: Enabling Zero-Barrier Evolution of AI Coding Agents Through Reinforcement Learning

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

32B Inference Performance Surpasses o1-mini! Alibaba Tongyi Launches FIPO Algorithm to Make Large Models Think Deeper

AliTongyi Lab Launches FIPO Algorithm to Significantly Enhance Large Model Inference Capabilities

AI Daily: Alibaba Launches Wan2.7 Video Model; Red Fruit Removes AI Drama 'Peach Hairpin'; State Administration Strictly Prohibits AI Actors from Face-Swapping

World Models Enter the Era of Fine-Tuning: Tencent Opensources the Reinforcement Learning Post-Training Framework WorldCompass

OpenClaw Can Train While Running: AReaL v1.0 Stable Version of the Intelligence Agent Reinforcement Learning Training Framework Released

Former Chief Scientist of DeepMind Resigns to Start His Own Venture, Aiming for a New Era of Superintelligence!

Tencent Hunyuan Welcomes a Top Scientist: Tianyu Peng Joins and Leads Multimodal Reinforcement Learning

Tongyi Qianwen Secures Another Victory: Qwen3-VL Twins Open-Source, Bringing a New Paradigm to Multimodal Retrieval

AI News Recommendations

NVIDIA Open Sources Polar Framework: Enabling Zero-Barrier Evolution of AI Coding Agents Through Reinforcement Learning

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

32B Inference Performance Surpasses o1-mini! Alibaba Tongyi Launches FIPO Algorithm to Make Large Models Think Deeper

AliTongyi Lab Launches FIPO Algorithm to Significantly Enhance Large Model Inference Capabilities

AI Daily: Alibaba Launches Wan2.7 Video Model; Red Fruit Removes AI Drama 'Peach Hairpin'; State Administration Strictly Prohibits AI Actors from Face-Swapping

World Models Enter the Era of Fine-Tuning: Tencent Opensources the Reinforcement Learning Post-Training Framework WorldCompass

OpenClaw Can Train While Running: AReaL v1.0 Stable Version of the Intelligence Agent Reinforcement Learning Training Framework Released

Former Chief Scientist of DeepMind Resigns to Start His Own Venture, Aiming for a New Era of Superintelligence!

Tencent Hunyuan Welcomes a Top Scientist: Tianyu Peng Joins and Leads Multimodal Reinforcement Learning

Tongyi Qianwen Secures Another Victory: Qwen3-VL Twins Open-Source, Bringing a New Paradigm to Multimodal Retrieval