On September 28, the world's largest AI open-source community Hugging Face announced the new model ranking. Alibaba's Tongyi has seven models occupying the top ten positions in the global open-source model ranking. Among them, the newly open-sourced multimodal large model Qwen3-Omni made a strong debut and topped the list.
Qwen3-Omni Achieves Industry-First Breakthroughs
Qwen3-Omni is Alibaba's latest open-sourced multimodal large model, achieving 32 best performance SOTA in audio and video capabilities. This model can process four different types of data: text, images, speech, and videos, capable of "listening, speaking, and writing" like humans. More importantly, Qwen3-Omni maintains stable performance in single-modal text and image tasks while achieving strong audio and video capabilities, which is the first time this training effect has been achieved in the industry.
Previously, complex instructions required multiple models to complete, but now only one Qwen3-Omni model can achieve this, comprehensively improving the user experience when interacting with AI. In the future, this model can be deployed in scenarios such as cars, smart glasses, and mobile phones.
The Tongyi Large Model Family is Flourishing
At the recently concluded 2025 Yunqi Conference, Alibaba launched seven models. In addition to Qwen3-Omni, six other models of different sizes, including the visual understanding model Qwen3-VL, the image editing model Qwen-Image-Edit-2509, the action generation model Wan2.2-Animate, and the deep research Agent model DeepResearch, have all entered the top ten of the Hugging Face global open-source model ranking.