Kunlun Tech officially launched the "Skywork UniPic" multimodal unified pre-training model based on the autoregressive approach and open-sourced it. This model integrates three core functions: image understanding, text-to-image generation (T2I), and image editing within a single system, marking further development in artificial intelligence technology.
The core feature of Skywork UniPic lies in its end-to-end pre-training based on large-scale high-quality data, demonstrating good generality and transferability. The team is committed to promoting open collaboration and innovative sharing, allowing users to access model weights, technical reports, and code repositories through the provided links, facilitating further exploration and application by developers and researchers.
Skywork UniPic draws inspiration from the autoregressive paradigm of GPT-4o, establishing a truly unified multimodal model architecture by combining tasks such as image understanding, text-to-image generation, and image editing. Unlike traditional multimodal models, Skywork UniPic adopts MAR encoder and SigLIP2 structural design, aiming to enhance the model's performance in understanding, generation, and editing tasks.
The model's capabilities include image-text understanding, image generation, and image editing. Users only need to input simple prompts, and Skywork UniPic can understand image content, generate new images, and even perform style transfer and other editing operations. Its ease of use and powerful features have made this model highly popular among developers.
Skywork UniPic, with its lightweight parameter scale of 1.5B, achieves performance close to that of large models, emphasizing the "small but beautiful" technical design concept. In various evaluations, the model performs excellently, especially in following instructions, generating complex instructions, and image editing, demonstrating outstanding execution capabilities.
To ensure the high performance of Skywork UniPic, the team also established a refined data construction and training system, continuously optimizing model performance through the use of selected training data and innovative reward models. Through multi-stage training and progressive task introduction, Skywork UniPic not only enhances the model's understanding and generation capabilities but also effectively addresses challenges in multi-task training.
The release of Skywork UniPic provides new solutions for the practical application of multimodal artificial intelligence models, significantly lowering the technical barriers and encouraging more developers to participate in exploration in this field.
Model Weights:
https://huggingface.co/Skywork/Skywork-UniPic-1.5B
Technical Report:
https://github.com/SkyworkAI/UniPic/blob/main/UNIPIC.pdf
Code Repository:
https://github.com/SkyworkAI/UniPic
Key Points:
🌟 Skywork UniPic is an open-source multimodal unified pre-training model introduced by Kunlun Tech, integrating image understanding, generation, and editing capabilities.
💻 The model uses a lightweight design of 1.5B parameters, with performance close to large models, making it easy for developers to use.
📊 Through refined data construction and multi-stage training, Skywork UniPic performs well in various evaluations, promoting the development of multimodal artificial intelligence.