Recently, Bilibili announced that its open-source anime video generation model AniSora has received a major update, upgrading to AniSora V3. This update not only improves the quality and smoothness of generated videos but also expands the diversity of anime styles, providing content creators in the fields of anime, manga, and VTubers with more powerful tools.

The highlights of AniSora V3 lie in its powerful features, which allow users to generate various styles of anime video shots with one click, covering content such as anime clips, domestic animated shows, manga adaptations, and MAD (Music Anime Dance) videos. Based on Bilibili's previously open-sourced CogVideoX-5B and Wan2.1-14B models, the V3 version combines reinforcement learning with human feedback (RLHF) technology, significantly improving the visual quality and action consistency of videos.

image.png

Specifically, AniSora V3 introduces a spatiotemporal masking module, which makes the model perform better in handling complex animation tasks. For example, users can generate smooth and natural dance animations by simply prompting "five girls dancing when the camera zooms in," with excellent synchronization between the camera and character actions. Additionally, the V3 version has expanded its dataset, training on over 10 million high-quality anime video clips to ensure consistency in style and details of the generated content.

In terms of hardware, AniSora V3 now supports Huawei Ascend910B NPU, training based on domestic chips, with a 20% increase in inference speed. Users can generate a 4-second video in just 2-3 minutes, greatly improving efficiency. At the same time, V3 has significantly enhanced its multi-task processing capabilities, supporting functions such as generating videos from single-frame images, keyframe interpolation, and lip synchronization, especially suitable for quickly creating manga adaptations and VTuber content.

The latest benchmark tests show that AniSora V3 has reached industry-leading levels in character consistency and action smoothness, especially excelling in handling complex animation actions. In addition, V3 introduced an RLHF framework specifically for anime video generation, ensuring that the generated content better meets human aesthetic needs. Developers have also started using V3 to create customized plugins, further enhancing the generation effect of specific anime styles.

AniSora V3 not only achieves breakthroughs in technology but also provides creators with a highly promising creative platform. Whether it's producing trailers or short animations, it can help users quickly realize their creativity.

Open source address: https://github.com/bilibili/Index-anisora/tree/main