Ant Group Open-Sources the Full-Modal Large Model Ming-Flash-Omni 2.0: Comprehensive Enhancements in Multimodal Understanding, Image Editing, and Voice Generation
Ant Group open-sources the full-modal large model Ming-Flash-Omni 2.0, which demonstrates outstanding performance in multiple benchmark tests, including visual language understanding, voice generation, and image processing, with some metrics surpassing Gemini 2.5 Pro. The model introduces a groundbreaking audio unified generation capability across all scenarios, supporting the generation of speech, sound effects, and music within the same audio track. Users can adjust parameters such as voice tone and speaking speed through natural language instructions.