On September 19, 2025, Alibaba Cloud announced that the new action generation model Wan2.2-Animate from Tongyi Wanxiang is now open-sourced. This model can drive images of people, anime characters, and animals, and is widely applied in short video creation, dance template generation, and animation production. Users can download the model and code on GitHub, HuggingFace, and the Moda Community, or call the API through the Alibaba Cloud BaiLian platform or experience it directly on the Tongyi Wanxiang official website.
The Wan2.2-Animate model is an advanced result based on the previously open-sourced Animate Anyone model. It significantly improves indicators such as character consistency and generation quality, and supports two modes: action imitation and role-playing. In the role imitation mode, by inputting a character image and a reference video, the model can transfer the actions and expressions of the video character to the image character, giving the image character dynamic expressiveness. In the role-playing mode, the model can replace the character in the video with the character in the image while keeping the original video's actions, expressions, and environment.
The Tongyi Wanxiang team has built a large-scale character video dataset covering speaking, facial expressions, and body movements, and performed post-training based on the Tongyi Wanxiang image-to-video model. Wan2.2-Animate standardizes role information, environmental information, and actions into a unified representation format, achieving a single model that is compatible with both reasoning modes. For body movement and facial expressions, the model uses skeleton signals and implicit features respectively, combined with an action re-direction module, to achieve accurate replication of actions and expressions. In the replacement mode, the team also designed an independent lighting fusion LoRA to ensure perfect lighting fusion effects.
Test results show that Wan2.2-Animate surpasses open-source models such as StableAnimator and LivePortrait in key indicators like video generation quality, subject consistency, and perceptual loss, becoming the most powerful action generation model currently available. In human subjective evaluations, Wan2.2-Animate even outperforms closed-source models such as Runway Act-two.
GitHub: https://github.com/Wan-Video/Wan2.2
Moda Community: https://modelscope.cn/models/Wan-AI/Wan2.2-Animate-14B
HuggingFace: https://huggingface.co/Wan-AI/Wan2.2-Animate-14B