Kunlun Wanyi Group announced on the third day of the SkyWork AI Technology Release Week that it has officially open-sourced its latest developed "Skywork UniPic2.0" model. The release of this unified multimodal model marks another major breakthrough in the field of multimodal artificial intelligence. Skywork UniPic2.0 is an efficient training and inference framework for unified multimodal modeling. By making the generation and editing modules lightweight, and through joint training of the multimodal understanding model, it builds core capabilities for understanding, image generation, and editing, aiming to achieve a "high-efficiency, high-quality, unified" multimodal generation model.

WeChat Screenshot_20250813091518.png

Skywork UniPic2.0 consists of three core modules: image generation and editing, unified model capabilities, and post-training of image generation and editing. Based on the SD3.5-Medium architecture, this model has been improved from supporting only text input to accepting both text and image input, expanding its image generation capability to dual capabilities of image generation and editing. By freezing the image generation and editing module, the multimodal model Qwen2.5-VL-7B and Pre-Train connector are used to build an integrated capability for understanding, generation, and editing. Then, by jointly fine-tuning the connector and the image generation and editing module, the final integrated model for understanding, image generation, and editing is achieved.

Skywork UniPic2.0 not only provides developers and researchers with a comprehensive open-source platform, including model weights, inference code, and reinforcement strategies, but also its generation module is trained based on the 2B parameter SD3.5-Medium architecture, achieving image generation and editing metrics that surpass other models with larger parameter counts. Additionally, the model introduces reinforcement learning, using the pioneering progressive dual-task reinforcement strategy called Flow-GRPO, effectively enhancing the model's ability to understand complex instructions and consistency in image generation and editing.

WeChat Screenshot_20250813091544.png

Project Homepage:

https://unipic-v2.github.io/

Technical Report:

https://github.com/SkyworkAI/UniPic/blob/main/UniPic-2/assets/pdf/UNIPIC2.pdf

GitHub Address:

https://github.com/SkyworkAI/UniPic/tree/main/UniPic-2

HuggingFace Gradio:

https://huggingface.co/spaces/Skywork/UniPic2-Metaquery

HuggingFace Model:

https://huggingface.co/Skywork/UniPic2-SD3.5M-Kontext-2B; https://huggingface.co/Skywork/UniPic2-Metaquery-9B