Significant breakthroughs have been made in the field of physical artificial intelligence. On June 1, NVIDIA officially launched an open-world foundational large model for physical AI - Cosmos3. As the world's first fully open-source and multimodal physical AI large model, this model is built using an innovative hybrid Transformer architecture, integrating visual reasoning, world generation, and action prediction capabilities within a single system. It has the potential to significantly reduce the training and evaluation cycle of physical AI from months to days.

Addressing the long-standing industry challenge of "difficulty in generalizing in real-world scenarios with limited data and fragmented simulation frameworks," Cosmos3 offers a new solution. The model was trained on a vast physical AI dataset containing billions of text, images, videos, audio, and motion trajectories. It can naturally understand and generate cross-modal content, achieving industry-leading physical simulation accuracy.

image.png

In terms of technical architecture, Cosmos3 innovatively combines a reasoning Transformer with a generative Transformer. The model first deeply analyzes the interaction rules, motion states, and spatiotemporal relationships of objects, then accurately completes video generation and action trajectory prediction. This design endows it with strong multimodal image and text understanding capabilities, physical environment simulation prediction abilities, and action strategy capabilities to assist robots in completing specific tasks. In multiple mainstream physical AI benchmark tests such as Artificial Analysis, Physics-IQ, and RoboLab, Cosmos3 ranks at the top among open-source models.

To comprehensively adapt to different development stages, NVIDIA has released multiple versions: Cosmos3Super, which focuses on secondary training of robot and autonomous driving models and pursues extreme precision, and Cosmos3Nano, which can complete high-quality video parsing and action reasoning within seconds. These two versions are now officially available; meanwhile, the Cosmos3Edge version, designed for real-time inference on the edge, is also in the release plan.

At the time of launching the large model, NVIDIA also jointly established the "NVIDIA Cosmos Coalition" with global top world model research teams and AI developers, including Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI. NVIDIA founder and CEO Huang Renxun stated that with continuous breakthroughs in multimodal reasoning and world models, the transformative era of physical AI has arrived. The release of this series of open-source cutting-edge models will help developers worldwide achieve technological leaps and create the next generation of intelligent systems capable of perceiving, reasoning, and performing actions in the real world.