On December 2nd, the Qwen APP integrated the latest model of the Wanxiang series, Wan 2.5, further enhancing its video creation capabilities. The model offers significant improvements in action accuracy and body coordination, and it is the first mobile AI assistant to support simultaneous audio and video output.
The Ali Wanxiang 2.5 is one of the few video models in the industry that supports synchronized audio and video. This model supports multiple tasks such as understanding and generating, and it accepts and outputs various modalities including text, images, videos, and audio. On the authoritative large model evaluation platform LMArena, Wanxiang's image-to-video capability ranks third globally and first domestically.
In the Qwen APP, users only need a photo and a piece of text, without any templates, to generate a high-definition 1080P dance video with natural body movements and accurate lip-sync. The maximum length supported is 10 seconds. Tests show that Qwen APP supports a wide range of subjects, including real person photos, cute pets, anime characters, cultural relics, and cartoon figures.

Last year, when Alibaba launched the photo-dancing feature, it quickly became popular both domestically and internationally, inspiring netizens' creative enthusiasm. Videos of Terracotta Warriors, cute kids, and pets dancing spread across the internet. With the integration of Wanxiang 2.5, the Qwen APP not only significantly improves video creation quality but also further lowers the barrier to video creation, supporting users to upload their own photos and input text. For example, users just need to input an image and a text like "a cat sings and dances," and the Qwen APP can accurately generate a video, bringing static images to life instantly.
This new feature has once again sparked netizens' creative enthusiasm, leading to a surge of more creative "photo dance" content on social platforms. For instance, users can first use the Qwen APP to merge two images into a photo in the style of a medieval painting, then input text such as "the people in the image sing and dance, with a dynamic camera shot," to achieve a video effect of group singing and dancing, while maintaining high-quality dynamic performance and strong subject consistency.
According to reports, during its public beta test, the Qwen App exceeded 10 million downloads within a week, surpassing ChatGPT, Sora, and DeepSeek to become the fastest-growing AI application in history.





