The Ming-flash-omni preview version is a multimodal large model built on the Ling-Flash-2.0 sparse mixture of experts (MoE) architecture, with a total of 100B parameters and only 6B parameters activated per token. This model is a comprehensive upgrade based on Ming-Omni, showing significant improvements in multimodal understanding and generation, especially in speech recognition, image generation, and segmentation editing.
Multimodal
DiffusersEnglish