In the field of AI image generation, style-driven and theme-driven image generation have long been considered two relatively independent tasks. The former focuses on the similarity of style, while the latter emphasizes the consistency of the theme, causing a contradiction between the two. Recently, ByteDance's Intelligent Creation Lab (UXO Team) introduced a new model called USO (Unified Style-Theme Optimization), successfully solving this long-standing industry problem.

image.png

ByteDance researchers understand that the growth of AI models depends on data. Therefore, they built a large dataset containing approximately 200,000 triplets. Each triplet includes a "style reference image," a "content reference image," and a "stylized target image." This design allows the model to learn how to combine style and content.

During the model's training process, ByteDance adopted a unique two-stage training method. The first stage focuses on learning style, using advanced image encoders to help the model understand deeper artistic styles. The second stage incorporates content information, ensuring the accuracy of the theme through processing. This approach allows style and content to be learned separately in the model, ultimately achieving a perfect integration when generating images.

To further improve the model's performance, the Byte team also introduced a style reward learning (SRL) mechanism, which uses reinforcement training to encourage the model to imitate the style as much as possible while keeping the theme unchanged. Ultimately, these innovations enabled USO to demonstrate high flexibility and precision when generating images.

To verify USO's capabilities, ByteDance also launched the first benchmarking platform in the industry, USO-Bench, capable of simultaneously evaluating style similarity and theme fidelity. On this platform, USO achieved significant advantages across various dimensions, surpassing existing open-source models.

USO's technology not only performs well in the digital art field but also brings new possibilities for commercial design. Brands can use USO to generate marketing materials with diverse yet unified styles, meeting the needs of different platforms. More importantly, USO has been fully open-sourced, encouraging more developers and creators to explore its potential.

github:https://github.com/bytedance/USO

Experience:https://huggingface.co/spaces/bytedance-research/USO

Key Points:

- 🎨 The USO model introduced by ByteDance breaks the opposition between style and theme, achieving a perfect combination of both.

- 📊 The USO model enhances the flexibility and accuracy of image generation through innovative training methods and a large dataset.

- 🌍 USO is fully open-sourced, encouraging developers to explore its applications in creative content and commercial design.