Recently, the Tencent Hunyuan team released their latest research findings on their official WeChat account - SRPO (Semantic Relative Preference Optimization), aimed at improving the realism of AI-generated images, especially addressing the "oily" issue in the skin texture of characters generated by the open-source text-to-image model Flux. This innovative technology is expected to bring revolutionary changes to the image generation field.

As digital art becomes increasingly popular, the quality of AI-generated images has become particularly important. Flux, as a popular base model in the open-source text-to-image community, often faces criticism for generating character skin that appears too smooth and unnatural. The joint research by the Tencent Hunyuan team with the Chinese University of Hong Kong (Shenzhen) and Tsinghua University proposed the SRPO solution, employing various methods such as online adjustment of reward preferences and optimization of the generation trajectory to enhance the realism of generated images.

image.png

The core of SRPO lies in introducing the concept of "semantic preference," adjusting the optimization objectives of the reward model by adding specific control prompts (such as "realism"). Experimental results show that this method significantly improves the realism of generated images. However, researchers also realized that a single semantic guidance might lead to reward cracking issues. Therefore, they innovatively introduced the "Semantic Relative Preference Optimization" strategy, using positive and negative words as guiding signals to balance the bias of the reward model.

image.png

Notably, traditional generation optimization methods often focus only on the latter half of the generation process, which can easily lead to overfitting on high-frequency information. By adopting the Direct-Align strategy, the Tencent Hunyuan team injects controllable noise into the input image and uses this noise as a reference anchor point for image reconstruction, significantly reducing reconstruction error and achieving more accurate reward signal transmission. This innovative approach supports the optimization of the first half of the generation trajectory, effectively solving the overfitting problem.

image.png

The SRPO technology has extremely high training efficiency, surpassing the existing DanceGRPO method in just 10 minutes. Research shows that SRPO's realism and aesthetic scores have increased by more than three times, and the training time has been reduced by 75 times compared to traditional methods. As this technology becomes widespread, the realism of AI-generated images in the future will be greatly improved, and it is expected to bring new possibilities to digital art creation.

Project Address: https://tencent.github.io/srpo-project-page/