Rhymes AI recently launched its revolutionary text-to-image-to-video generation model, Allegro-TI2V. This groundbreaking technology opens up new horizons for digital content creation. As the latest advancement in generative AI, Allegro-TI2V provides creators with unprecedented visual storytelling tools, marking the immense potential of AI technology in the creative field.
Allegro-TI2V excels in multiple technical specifications, supporting a context length of up to 79.2K, equivalent to 88 frames of video. Its output resolution is 720×1280 pixels, with a video generation speed of 15 frames per second. Users can also choose to interpolate to 30FPS to meet the needs of different application scenarios. The model architecture is highly complex, featuring a VideoVAE with 175 million parameters and a VideoDiT model with 2.8 billion parameters, enabling it to accurately capture the essence of user-input text prompts and initial images. Additionally, Allegro-TI2V supports multiple precision modes (FP32, BF16, FP16), and in BF16 mode, it requires only 9.3GB of GPU memory to generate videos, significantly reducing hardware requirements.
The innovation of Allegro-TI2V lies in its introduction of two new generation modes: Subsequent Video Generation: creates continuous video content based on text prompts and initial frames. This mode helps creators easily generate videos that align with the specified theme and style. Intermediate Video Generation: generates naturally transitioning intermediate frames based on the first and last frames of a given video, breaking the traditional time and space constraints of video editing.
These innovative modes allow Allegro-TI2V to offer creators a more efficient and flexible way to produce videos, significantly enhancing both creation efficiency and quality.
Rhymes AI has released Allegro-TI2V under the Apache 2.0 license, making it easier for researchers, developers, and content creators to access and utilize this technology. Users only need to install Python 3.10+, PyTorch 2.4+, and CUDA 12.4+ to easily get started and quickly experience this advanced technology.
The application prospects of Allegro-TI2V are extremely broad, fully leveraging its powerful generation capabilities in film production, game development, digital art, and creative prototyping. According to the data provided by developers, a single H100 GPU can generate a 6-second video in about 20 minutes, while using a configuration of 8 H100 GPUs can reduce the generation time to 3 minutes, significantly improving the efficiency of video content creation.
Usage link: https://huggingface.co/rhymes-ai/Allegro-TI2V
Product link: https://rhymes.ai/blog-details/allegro-advanced-video-generation-model