DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

AIbase基地

Published inAI News · 5 min read · Jul 4, 2025

In the field of artificial intelligence, reward models are a critical component for aligning large language models (LLMs) with human feedback, but existing models face the "reward hacking" problem.

These models often focus on superficial features, such as the length or format of responses, rather than identifying genuine quality metrics, such as factual accuracy and relevance. The root cause lies in the standard training objectives failing to distinguish between spurious associations and true causal drivers present in the training data. This failure leads to fragile reward models (RMs), which generate misaligned policies. To address this issue, a new approach is needed that utilizes causal understanding to train RMs, making them sensitive to causal quality attributes and robust against various spurious cues.

Existing reward model methods attempt to solve the reward hacking problem in standard RLHF systems that rely on Bradley-Terry or pairwise ranking approaches, including architectural modifications, policy-level adjustments, and data-centric methods involving ensembles or consistency checks. Recent causal heuristics use MMD regularization to target pre-specified spurious factors, or correct rewriting to estimate causal effects. However, these methods only target pre-determined spurious factors and fail to capture unknown associations. Although enhanced strategies remain relatively crude, and evaluation-centered approaches do not provide robust training mechanisms for reward models to handle diverse spurious variations.

To address these challenges, researchers from Google DeepMind, McGill University, and MILA - Quebec AI Institute proposed Crome (Causal Robust Reward Modeling). The Crome framework is built upon an explicit causal model of answer generation, training RMs by adding a preference dataset with targeted, large language model-generated counterfactual examples, thereby distinguishing real quality drivers from surface cues. Additionally, Crome creates two types of synthetic training pairs: causal augmentations and neutral augmentations, enhancing the model's robustness and maximizing the accuracy of the reward benchmark.

Crome's operation is divided into two main stages: generating attribute-aware counterfactual data based on the causal model, and training the reward model through a specialized loss on the combined data. When evaluating performance, researchers used various base LLMs, including Gemma-2-9B-IT and Qwen2.5-7B, achieving significant performance improvements.

Crome performs exceptionally well on multiple benchmarks, particularly showing notable progress in safety and reasoning capabilities. Additionally, it performs well on WildGuardTest, reducing the attack success rate against harmful prompts while maintaining a similar rejection rate for benign prompts.

In the future, the research direction of Crome will focus on causal data augmentation, promoting synthetic data generation, and providing new possibilities for pre-trained model training.

Paper: https://arxiv.org/abs/2506.16507

Key Points:
🌟 The Crome framework was proposed by institutions such as Google DeepMind, aiming to enhance the robustness of reward models.
📈 Crome significantly improves the model's performance in multiple tasks through causal augmentations and neutral augmentations strategies.
🔒 Crome performs excellently in security tests, reducing the attack success rate and improving the reliability of the model.

Reward Models AI New Terms LLMs Causal Understanding

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://top.aibase.com/1. Tencent Yuanbao upgrades again: one phrase search, images and videos appear instantly, making information retrieval more intuitive! The upgraded features of Tencent Yuanbao make information retrieval more intuitive and efficient. Users just need to ask a question in one phrase to get text and image results.

Jul 4, 2025

WeChat Pay MCP Launch: The Perfect Combination of AI and Payment, Opening a New Era for Business

Jul 4, 2025

130

Figma to List on NYSE with an Estimated Valuation of $20 Billion, AI Design Holds a Bright Future

The cloud collaboration design software company Figma, based in San Francisco, has officially submitted an IPO application to the U.S. Securities and Exchange Commission (SEC), planning to list on the New York Stock Exchange (NYSE) under the stock ticker FIG. According to recent reports, Figma's target valuation is approximately $2 billion, and it is expected to become one of the most anticipated technology IPOs in 2025. This article was compiled by the AIbase editorial team, based on publicly available online information, providing an in-depth analysis of Figma's listing background, technological innovations and

Jul 4, 2025

Google Launches New Veo 3 Video Generation Model Globally

Google announced the global launch of its latest video generation model, Veo3. This long-anticipated release has generated great excitement among users, as Veo3 is now available to Gemini users in over 159 countries, offering a new video creation experience. The key feature of the Veo3 video generation model is its ability to generate videos up to eight seconds long based on simple text prompts. According to Google, this technology is designed for creative users, especially those on social media who increasingly demand short-form content.

Jul 4, 2025

100

Hitachi Energy Warns: Power Demand Fluctuations in AI Centers May Threaten Global Power Supply Stability

Recently, Andreas Schierenbeck, CEO of Hitachi Energy, the world's largest transformer manufacturer, stated in an interview with the Financial Times that as large technology companies see a surge in power demand when training artificial intelligence models, governments need to take measures to limit these fluctuations to ensure the stability of the power supply. Image source note: The image is generated by AI, and the image licensing service provider is Midjourney. Schierenbeck said that the power demand fluctuations in AI data centers are extremely severe,

Jul 4, 2025

160

Meta Unveils Proactive Chatbot That Lets AI Initiate the Conversation

Recently, Meta has been testing a new type of chatbot that will proactively send messages to users, rather than just responding after a user initiates a conversation. Imagine you're chatting with a friend on Facebook Messenger or WhatsApp, and suddenly an AI chatbot named "The Maestro of Movie Magic" sends you a message: "I hope you're having a great day! I'd like to know if you've seen any..."

Jul 4, 2025

130

Tencent Yuanbao Upgrades Again: One-Phrase Search, Images and Videos Instantly Displayed, Information Access More Intuitive!

The smart assistant Yuanbao announced today a major upgrade to its core search function, introducing the new feature 'More Can Be Searched with Just One Phrase.' Now, users only need to ask a simple question, and Yuanbao will intelligently match and display content from images and video accounts, making information access more abundant and intuitive than ever before. In the past, Yuanbao could easily handle daily needs such as weather inquiries, stock price checks, and location searches. This upgrade takes Yuanbao's intelligent search capabilities to a new level. Whether you want to learn a new skill or solve a small problem in life, Yuanbao can integrate text

Jul 4, 2025

180

Cluely doubles its annual recurring revenue to $7 million within a week

The fast-growing startup Cluely in Silicon Valley recently announced that its annual recurring revenue (ARR) has rapidly surged to about $7 million after launching a new enterprise product. This growth rate has excited the founder Roy Lee, who told TechCrunch: "Everyone who has a meeting or interview is testing this product." Cluely is dedicated to using artificial intelligence to analyze online conversations, providing real-time meeting notes, background information, and question suggestions, all seamlessly displayed on the user's screen.

Jul 4, 2025

100

Founder of Neuracle Technologies Peng Lei Predicts Five Disruptive Trends in Brain-Computer Interface for the Next Five Years

At the 11th Innovation Annual Meeting of the 2025 Yabuli China Entrepreneurs Forum, Peng Lei, founder and chairman of Neuracle Technologies, deeply discussed the future development of brain-computer interface (BCI) technology and proposed five major new trends in this field over the next five years. These trends are expected to completely change people's lifestyles and the technological landscape. 1. Integration of Brain-Computer Interface and Spinal Cord: A Hope for Paralyzed Patients. Peng Lei pointed out that the integration of brain-computer interfaces with the spinal cord will be a major trend in the future. Since the brain and spinal cord are closely connected, spinal cord injuries in patients with high-level paralysis hinder the conduction of nerve signals. In the future,

Jul 4, 2025

110

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information

Recently, a research team from Renmin University, Shanghai Artificial Intelligence Laboratory, University College London, and Dalian University of Technology revealed an important finding in the reasoning process of large models: when the model is thinking, the 'thinking words' it uses actually reflect a significant increase in its internal information. This research result provides a new perspective for better understanding the reasoning mechanisms of artificial intelligence through methods of information theory. You may have seen large models output some language that seems human-like when answering questions, such as "Hmm..." or "Let me think...".

Jul 4, 2025

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

WeChat Pay MCP Launch: The Perfect Combination of AI and Payment, Opening a New Era for Business

Figma to List on NYSE with an Estimated Valuation of $20 Billion, AI Design Holds a Bright Future

Google Launches New Veo 3 Video Generation Model Globally

Hitachi Energy Warns: Power Demand Fluctuations in AI Centers May Threaten Global Power Supply Stability

Meta Unveils Proactive Chatbot That Lets AI Initiate the Conversation

Tencent Yuanbao Upgrades Again: One-Phrase Search, Images and Videos Instantly Displayed, Information Access More Intuitive!

Cluely doubles its annual recurring revenue to $7 million within a week

Founder of Neuracle Technologies Peng Lei Predicts Five Disruptive Trends in Brain-Computer Interface for the Next Five Years

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information