NVIDIA Research has recently officially released the Lyra2.0 framework on the Hugging Face platform, marking a new milestone in AI-generated 3D world-building technology. Starting from a single input image, Lyra2.0 can generate large-scale 3D scenes that are persistent, consistent, and freely exploratory, supporting real-time rendering, robot simulation, and immersive applications.

AIbase editors believe that this release not only improves the spatiotemporal consistency of video generation models but also provides practical asset pipelines for physical AI, game development, and virtual environment construction.

QQ20260420-110843.png

Core Challenges and Breakthroughs: Saying Goodbye to Spatial Forgetting and Temporal Drift

Traditional long-term video generation models often suffer from "spatial forgetting"—the model cannot remember details of previously generated areas, leading to inconsistent scenes—and "temporal drifting"—objects' positions and appearances gradually shift over time, severely affecting subsequent 3D reconstruction.

Lyra2.0 addresses these two major issues with innovative solutions:

  • Spatial Memory Mechanism: The system maintains 3D geometric information for each frame, but it is only used for information routing—retrieving relevant historical frames and establishing dense correspondences, while appearance synthesis still relies on powerful generative priors to avoid accumulation of geometric errors.
  • Self-Enhancing Training Strategy: During training, the model is exposed to its own degraded outputs, teaching it to actively correct drift rather than continue propagating it, thereby achieving longer 3D consistent video trajectories.

Through this two-stage design, Lyra2.0 can generate long video sequences starting from a single image and user-defined camera trajectories, reliably enhancing them into high-quality 3D Gaussian splatting or mesh models, supporting real-time rendering and further simulation.

Usage Process: From Image to Explorable 3D World

  1. Input an image (optional with text prompts);
  2. Define the camera movement trajectory through an interactive 3D browser;
  3. The model regresses to generate long video clips controlled by the camera;
  4. Upgrade the video sequence into an explicit 3D representation (point cloud, Gaussian, or mesh), and use it for continuous navigation;
  5. Finally, export assets directly usable in environments like Unity, Unreal, and Isaac Sim.

Experiments show that Lyra2.0 outperforms multiple existing methods such as GEN3C, CaM, and Yume-1.5 in long video generation and 3D scene reconstruction metrics, especially in terms of scene scale and consistency. Generated scenes can reach tens of meters, allowing users to freely "go back," look around, and even deploy robots for real-time interaction.

Open Source and Application Value: Accelerating Physical AI and Virtual World Development

The model weights of Lyra2.0 are now open on Hugging Face (nvidia/Lyra-2.0), and the code repository is also available on GitHub (nv-tlabs/lyra), under the Apache 2.0 license, allowing commercial use. The underlying video backbone is based on powerful diffusion models such as Wan-14B, and the reconstruction phase integrates tools like Depth Anything V3, ensuring high-quality and practical output.

This framework is particularly suitable for:

  • Embodied AI and robot training: generating consistent simulation environments directly imported into Isaac Sim;
  • Games and Immersive Content: rapidly building exploratory virtual worlds;
  • 3D Asset Generation Pipeline: completing from concept drawings to editable meshes in one go.

Compared to earlier versions, Lyra2.0 has made significant progress in scene persistence and scalability, paving the way for "world models" to move from demonstration to practical assets.

AIbase Editors' Comments: NVIDIA's latest open-source release not only demonstrates technical breakthroughs in spatiotemporal modeling with generative AI but also reflects the industry's ongoing commitment to open ecosystems. As tools like Lyra2.0 become more widespread, developers will be able to build large-scale, interactive 3D worlds more efficiently, accelerating the deployment of applications in robotics, autonomous driving, and the metaverse.

The project page, paper, and model are all publicly available. Interested developers can immediately visit Hugging Face and GitHub to experience them.

Paper URL: https://huggingface.co/papers/2604.13036

Model URL: https://huggingface.co/nvidia/Lyra-2.0