OmniGen2 is a powerful and efficient unified multimodal model composed of a 3B vision-language model and a 4B diffusion model, supporting visual understanding, text-to-image generation, instruction-guided image editing, and context generation.
Multimodal
Diffusers