On April 24th, Kunlun Wanwei announced the official open-sourcing of its multi-modal reasoning model, Skywork-R1V2.0 (hereinafter referred to as R1V2.0). This upgraded version shows significant improvements in both visual and text reasoning capabilities, particularly excelling in deep reasoning of challenging science problems from the National College Entrance Examination (Gaokao) and general-purpose task scenarios. It's considered one of the most balanced open-source multi-modal models currently available, equally adept at visual and textual reasoning.

The open-sourcing of R1V2.0 not only showcases Kunlun Wanwei's technological prowess in the multi-modal field but also provides a powerful tool for global developers and researchers, fostering the development of the multi-modal ecosystem. The model has set new open-source SOTA records in several authoritative benchmark tests, demonstrating capabilities comparable to commercial closed-source models.

Significantly Enhanced Performance, Leading in Chinese Language Scenarios

R1V2.0's performance is particularly outstanding in Chinese language scenarios, especially in reasoning scientific problems (mathematics, physics, chemistry), effectively serving as a free AI problem-solving assistant. The model achieved an impressive score of 73.6 on MMMU, setting a new open-source SOTA record, and reached 62.6 on the Olympiad Bench, significantly outperforming other open-source models. Furthermore, R1V2.0 performed exceptionally well on various visual reasoning leaderboards, including MathVision, MMMU-PRO, and MathVista, with several capabilities now comparable to closed-source commercial models.

In terms of text reasoning, R1V2.0 achieved scores of 78.9 and 63.6 on AIME2024 and LiveCodeBench respectively, demonstrating human-expert-level mathematical and code comprehension abilities. These results demonstrate R1V2.0's excellence not only in visual reasoning but also in text reasoning.

微信截图_20250424103054.png

Technical Highlights: Multi-modal Reward Model and Mixed Preference Optimization

The performance improvements in R1V2.0 are attributed to several technical innovations. Most notably, the newly introduced multi-modal reward model, Skywork-VL Reward, and the mixed preference optimization (MPO) mechanism.

The Skywork-VL Reward model provides high-quality reward signals for multi-modal reinforcement learning, accurately evaluating the overall quality of long-sequence outputs from multi-modal reasoning models. This model achieved a SOTA score of 73.1 on the VL-RewardBench leaderboard for visual reward models and an impressive score of 90.1 on the RewardBench leaderboard for text-only reward models, showcasing its strong generalization capabilities in both multi-modal and text tasks.

The MPO mechanism, by incorporating multiple loss functions for collaborative optimization, solves the challenge of balancing "deep reasoning improvement" and "general capability maintenance" in large model training. R1V2.0 leverages the preference signals provided by Skywork-VL Reward to guide preference consistency optimization, ensuring the model's good general adaptability across multiple tasks and domains. Furthermore, when training deep reasoning capabilities, R1V2.0 employs the rule-based group relative policy optimization (GRPO) method, guiding the model to learn more accurate selection and reasoning paths through relative reward comparisons between candidate responses within the same group.

Continuous Open-Sourcing, Driving AGI Development

Kunlun Wanwei has been committed to promoting open-source innovation in artificial intelligence. The open-sourcing of R1V2.0 marks a significant milestone for Kunlun Wanwei in the multi-modal field. The model's 38B weights and a complete technical report have been fully open-sourced on Hugging Face and GitHub, allowing developers to freely access and utilize these resources.

Kunlun Wanwei states that open-source drives innovation, and AGI will eventually arrive. R1V2.0 not only pushes the boundaries of open-source multi-modal large models but also provides a new base model for building multi-modal intelligent agents. In the future, Kunlun Wanwei will continue to adhere to the principles of "open-source, open, and co-creation," continuously releasing leading large models and datasets to empower developers, promote industry collaboration and innovation, and accelerate the realization of Artificial General Intelligence (AGI).

-Model Weights:

Hugging Face - Skywork-R1V2.0-38B

-Code Repository:

GitHub - SkyworkAI/Skywork-R1V

-Technical Report:

https://arxiv.org/abs/2504.16656