Zhejiang University and Alibaba have jointly launched a new audio-driven model called OmniAvatar, marking a new height in digital human technology. The model uses audio as the driving force to generate natural and smooth full-body digital human videos, especially showing excellent performance in singing scenarios, with accurate synchronization between mouth movements and audio lip shapes, resulting in realistic effects.

OmniAvatar supports fine control of generated details through text prompts, allowing users to customize the range of character movements, background environment, and emotional expressions, demonstrating high flexibility. In addition, the model can generate videos of virtual characters interacting with objects, providing broad application opportunities in commercial scenarios such as e-commerce and marketing advertisements. For example, brands can use OmniAvatar to create dynamic ads, enhancing consumer interaction experiences.

As an open-source project, OmniAvatar has been released on GitHub and has attracted global developers' attention. Its excellent performance in facial expressions, upper body, and full-body animation generation surpasses existing similar models. According to reports, the model also supports multi-scene applications, including podcast programs, interpersonal interactions, and dynamic performances, demonstrating its great potential in the field of content creation.

Industry experts stated that the release of OmniAvatar not only enhances the authenticity and controllability of audio-driven digital human technology but also promotes the innovative application of AI in fields such as marketing, education, and entertainment. In the future, Zhejiang University and Alibaba will continue to deepen their cooperation and explore more possibilities of multimodal AI.