SRPO is a human preference alignment method for diffusion models. Through Direct-Align technology and semantic relative preference optimization, it significantly improves the realism and aesthetic quality of the FLUX.1-dev model, and solves the problems of high computational cost in multi-step denoising and dependence on offline reward fine-tuning.
Computer Vision
Diffusers