Tencent Hunyuan 3D team officially announced the open-source release of the world's first reinforcement learning (RL) post-training framework for world models - WorldCompass. As an official reinforcement learning expansion module for Hunyuan World Model 1.5, this framework aims to significantly improve the accuracy and user experience of world models during interactions.

Current mainstream world models mainly rely on large-scale pre-training, but when facing complex combination action instructions from users, they often encounter "understanding bias" or inaccurate execution issues. The emergence of WorldCompass provides a new "compass" to solve this pain point.

image.png

By introducing a reinforcement learning mechanism, this framework can deeply fine-tune pre-trained models, enabling them to more accurately interpret and execute complex action instructions, thus avoiding the embarrassment of "not understanding" commands. Evaluation data shows that after applying WorldCompass, the open-source SOTA model WorldPlay saw its interaction accuracy (Accaction) in the most difficult composite action scenarios rise from about 20% to over 55%, an increase of more than 35%.

In addition to enhancing action control, the framework also significantly improved the visual fidelity score (HPSv3), ensuring that the model maintains consistent visual performance during long-distance, long-time sequence virtual world exploration. The Tencent Hunyuan team stated that the release of WorldCompass marks the formal transition of world models from a purely "pre-training era" to a "reinforcement learning fine-tuning era."

Currently, the relevant technologies of WorldCompass have been validated in the Hunyuan WorldPlay model. Tencent has fully open-sourced the related code and technical reports, aiming to provide a technical path for global developers to build more intelligent and controllable "generative world simulators."

Key Points

  • 🎯 Precision Control: Overcame the industry challenge of inaccurate execution by world models under complex action instructions, achieving a multiple-fold increase in accuracy.

  • 🤖 RL Deep Empowerment: Demonstrated the significant tuning potential of reinforcement learning in long-term, interactive world models.

  • 🌐 Full Stack Open Source: Fully opens up from code to model details, helping developers create more immersive virtual interaction environments.

  • 🚀 Generation Crossing: Shifts the focus of world model technology from data stacking to meticulous refinement of interaction logic.