Open source large model field has made a new breakthrough. "Wen Xiao Bai" officially released its fourth-generation open-source model XBai o4, which demonstrates excellent complex reasoning capabilities. Its Medium mode has fully surpassed OpenAI o3-mini and outperformed Anthropic Claude Opus in some benchmark tests.

XBai o4 introduces an innovative "reflective generative paradigm," combining Long-CoT reinforcement learning and process reward learning to achieve deep reasoning and efficient reasoning chain screening, while significantly reducing reasoning costs.

QQ20250804-172822.png

Technical Breakthrough: Unique "Reflective Generative Paradigm"

The core innovation of XBag o4 is its unique "reflective generative form". This paradigm combines Long-CoT reinforcement learning with process reward learning (Process Reward Learning), allowing a single model to simultaneously complete two key tasks:

  1. Deep reasoning: Think through multiple steps like humans.

  2. High-quality reasoning chain selection: Evaluate and select the optimal reasoning path.

More notably, XBag o4 reduces the reasoning time of process rewards by 99% by sharing the backbone network of the process reward model (PRMs) and the policy model. This optimization significantly improves the model's operational efficiency, providing a solid foundation for practical applications.

Excellent Performance: Leading in Multiple Benchmark Tests

The XBag o4 model offers three modes (low, medium, high) to adapt to different task complexities. Its strong performance has been fully validated in various key benchmark tests:

  • In the Medium mode, XBag o4 fully surpasses the OpenAI o3-mini model.

  • In some benchmark tests, its performance even exceeds that of Anthropic's Claude Opus.

  • The model has demonstrated outstanding reasoning ability in multiple tests such as AIME24, AIME25, LiveCodeBench v5, and C-EVAL.

"Wen Xiao Bai" has open-sourced the relevant training and evaluation code on GitHub, which not only provides valuable resources for the AI research community but also indicates that open-source large models are rapidly enhancing their competitiveness in the field of complex reasoning.

Address: https://github.com/MetaStone-AI/XBai-o4