AI video generation has reached a milestone breakthrough! The Lightricks team has officially open-sourced the LTX-2 model, which is hailed as the first truly complete open-source audio-visual foundation model. It supports generating up to 20-second 4K high-definition videos in one go and achieves perfect synchronization of visuals, sound, lip movements, ambient sounds, and music. The AIbase editing team has compiled the latest online updates, bringing you a comprehensive analysis.
Open-Source Gift Package: Weights and Code Fully Released, Community Celebrates
The LTX-2 model weights, complete training code, benchmark tests, and toolkits have all been open-sourced, hosted on GitHub and Hugging Face. Developers can freely inspect, fine-tune, and deploy locally. The model is based on a DiT hybrid architecture, supporting text-to-video, image-to-video, multi-keyframe control, 3D camera logic, and LoRA fine-tuning. Latest updates show that ComfyUI has native support for LTX-2 on Day 0, providing ready-to-use workflows, greatly lowering the learning curve. After optimization for NVIDIA RTX consumer-grade GPUs, generation efficiency has significantly improved, allowing ordinary users to experience professional-level output without enterprise-grade hardware.

Core Highlights: Audio and Video Combined, Synchronized Generation Without Post-Processing
Differing from traditional models that require separate audio stitching, LTX-2 jointly generates visual and audio elements in a single process, ensuring natural alignment of actions, dialogue, ambient sound effects, and music. It supports native 4K resolution, up to 50fps frame rate, and up to 20 seconds of continuous clips. Practical testing shows excellent lip-syncing and expression rendering, with highly realistic character dialogue scenes. Additionally, the model maintains high consistency under complex prompts, with significantly better skin texture and motion smoothness compared to most open-source competitors. Input modalities are flexible, driven by text, images, or sketches, suitable for short films, advertisements, and content creation.
Performance Optimization: Faster, More Efficient, and Friendly for Local Execution
Compared to previous versions and some competitors, LTX-2 reduces computational costs by up to 50%, and supports long-sequence expansion with multi-GPU inference stacks. Quantized versions further reduce GPU memory requirements, running smoothly on RTX 40 series and higher GPUs. Community feedback indicates that generating 10-20 second videos takes only a few minutes, making real-time preview possible. This marks a shift of high-end AI video generation from cloud-based closed systems to local open-source democratization, significantly lowering the threshold for creators.
Unlimited Application Potential: From Personal Creation to Professional Production
LTX-2 has demonstrated strong potential in content creation, animation, marketing, and film previsualization. It supports video-to-video control such as Canny, Depth, and Pose, combined with keyframe-driven approaches, enabling precise storytelling and consistent style. In the future, with community LoRA and plugin extensions, this model may become the core engine of the open-source AI video ecosystem, driving innovation from short-form to long-form content.
AIbase Perspective: The open-sourcing of LTX-2 is not only a technological leap but also a critical step towards democratizing AI video. It fills the gap in open-source audio-visual joint generation and may accelerate the popularity of local AI tools. AIbase will continue to monitor its community development and practical applications; stay tuned for our follow-up reports.



