Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

AIbase基地

Published inAI News · 5 min read · Jul 4, 2025

On July 4, 2025, Kuaishou continued to open-source the second generation of reward models, the Skywork-Reward-V2 series. This series includes 8 reward models based on different foundation models, with parameter sizes ranging from 600 million to 8 billion. Upon release, it achieved top rankings across seven major reward model evaluation benchmarks, becoming a focal point in the open-source reward model field.

Reinforcement learning from human feedback (RLHF) relies heavily on reward models. To develop the next generation of reward models, Kuaishou created a hybrid dataset called Skywork-SynPref-40M, containing 40 million pairs of preference comparisons. In data processing, the team adopted a two-stage human-machine collaborative process, combining high-quality manual annotations with the large-scale processing capabilities of models. In the first stage, an initial unverified preference pool was built, and large language models were used to generate auxiliary attributes. Then, human annotators conducted meticulous reviews of some data according to strict protocols, external tools, and large language models, building a small-scale, high-quality "gold standard" dataset. Subsequently, guided by the gold standard data preferences, the team combined large language models to generate large-scale, high-quality "silver standard" data, and iteratively optimized multiple times. In the second stage, the focus shifted to automated large-scale data expansion, using the trained reward model to perform consistency filtering, reducing the burden of manual annotation while achieving a balance between the scale and quality of preference data.

WeChat screenshot_20250704095952.png

The Skywork-Reward-V2 series, developed based on high-quality mixed preference data, demonstrates broad applicability and excellent capabilities. It covers multiple dimensions such as general alignment with human preferences, objective correctness, safety, resistance to style bias, and best-of-N scalability. It has achieved state-of-the-art (SOTA) levels across seven mainstream reward model evaluation benchmarks, including Reward Bench v1/v2, PPE Preference & Correctness, RMB, RM-Bench, and JudgeBench. Even the smallest model, Skywork-Reward-V2-Qwen3-0.6B, achieves nearly the average performance of the previous generation's strongest model. The Skywork-Reward-V2-Qwen3-1.7B model even surpasses the current SOTA in open-source reward models. The largest model, Skywork-Reward-V2-Llama-3.1-8B, outperforms all mainstream benchmarks, becoming the best-performing open-source reward model currently available.

This series of models also demonstrates extensive coverage of multi-dimensional human preferences. It outperforms multiple larger-parameter models and the latest generative reward models on general preference evaluation benchmarks. In terms of objective correctness evaluation, it performs exceptionally well in knowledge-intensive tasks. In various advanced capability evaluations, including Best-of-N tasks, bias resistance testing, complex instruction understanding, and truthfulness judgment, it has achieved leading results, demonstrating excellent generalization ability and practicality.

Additionally, the high scalability of the data screening process significantly enhances the performance of reward models. Preference data that has been finely filtered and screened can continuously improve the overall performance of the model through multiple rounds of iterative training, especially showing significant results in the second stage of fully automatic data expansion. Early version experiments showed that training an 8B-scale model with only 1.8% of high-quality data exceeded the performance of the current 70B-level SOTA reward model, proving the advantages of the Skywork-SynPref dataset in both scale and quality.

HuggingFace address:

https://huggingface.co/collections/Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84

GitHub address:

https://github.com/SkyworkAI/Skywork-Reward-V2

Google Launches New Veo 3 Video Generation Model Globally

Google announced the global launch of its latest video generation model, Veo3. This long-anticipated release has generated great excitement among users, as Veo3 is now available to Gemini users in over 159 countries, offering a new video creation experience. The key feature of the Veo3 video generation model is its ability to generate videos up to eight seconds long based on simple text prompts. According to Google, this technology is designed for creative users, especially those on social media who increasingly demand short-form content.

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

In the field of artificial intelligence, reward models are a critical component for aligning large language models (LLMs) with human feedback, but existing models face the issue of "reward hacking." These models often focus on superficial features, such as the length or format of responses, rather than identifying genuine quality metrics, such as factual accuracy and relevance. The root cause lies in standard training objectives failing to distinguish between spurious associations and true causal drivers present in the training data. This failure leads to fragile reward models (RMs), which generate misaligned policies.

China's Medical Large Model Release Volume Accounts for 70% of the Global Total! KPMG Reveals Future Market Potential

According to KPMG China's recent report, "The First 50 Health Tech Companies," China accounts for more than 70% of the global release volume of medical large models. This data not only demonstrates China's rapid development in the field of intelligent healthcare, but also reflects the wide application of large language models in the healthcare industry. The report points out that about 65% of the currently released medical large models are large language models. These models can process and generate natural language, playing a significant supporting role in the analysis of medical data, patient communication, and scientific research.

Xiaopeng G7 Ultra Makes a Grand Debut! Revolutionary Intelligent Driving Large Model Unveiled

In the new energy vehicle market, Xiaopeng Automotive has once again drawn attention. On July 3rd, the Xiaopeng G7 Ultra was officially launched, becoming the first intelligent vehicle equipped with the local-end "VLA+VLM" large model. This innovative technology marks an important step forward for Xiaopeng in the field of intelligent driving. The Xiaopeng G7 Ultra is equipped with the VLA (active thinking and rapid decision-making capability) large model, making the driving experience more intelligent. In daily driving, the G7 Ultra can flexibly handle various complex driving scenarios, such as in traffic.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Launches New Veo 3 Video Generation Model Globally

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

MiniMax Launches the World's First Open-Source Large-Scale AI Model, Technological Breakthrough Attracts Industry Attention

Google Veo 3 Video Generation Model Now Available to Pro/Ultra Subscribers, Will Add Photo-to-Video Function

China's Medical Large Model Release Volume Accounts for 70% of the Global Total! KPMG Reveals Future Market Potential

Xiaopeng G7 Ultra Makes a Grand Debut! Revolutionary Intelligent Driving Large Model Unveiled

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

ByteDance Open Sources New Model VINCIE-3B: 300 Million Parameters Support Continuous Image Editing with Context

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Bilibili Open-Sourced Anime Video Generation Model AniSora V3 Version - One-Click Generation of Various Style Anime Video Shots