Runway, a star company in the field of AI video generation, has officially entered the "world model" race. On Thursday, the company launched its first general world model, GWM-1, which claims to build a dynamic simulation environment that understands physical laws and temporal evolution through frame-by-frame pixel prediction. This move places Runway alongside giants like Google and OpenAI, competing for the core infrastructure of next-generation embodied intelligence and general artificial intelligence.

A "world model" refers to an AI system's internal simulation of the mechanisms of the real world, allowing it to perform reasoning, planning, and autonomous actions without individual training for each real-world scenario. Runway believes that the optimal path to achieving this goal is to let the model directly learn to predict pixels—that is, to learn physics, lighting, geometry, and causal relationships from video frames. In a live stream, the company's CTO, Anastasis Germanidis, emphasized: "To build a world model, we must first create an extremely powerful video model. With sufficient scale and high-quality data, the model will naturally gain a deep understanding of how the world works."

image.png

GWM-1 is not a single product but is being implemented through three specialized branches: GWM-Worlds, GWM-Robotics, and GWM-Avatars. Among them, GWM-Worlds is an interactive application where users can set an initial scene through text prompts or images, and the model then generates a dynamic world running at 24 frames per second with a resolution of 720p. This space not only has a coherent geometric structure and lighting logic but can also generate new content in real-time as users "explore" it. Runway states that this capability is not only suitable for game development but can also serve as a virtual sandbox for training AI agents to navigate and make decisions in the physical world.

In the robotics field, GWM-Robotics injects synthetic data with variables such as weather changes and dynamic obstacles, helping robots rehearse behaviors in high-risk or hard-to-reproduce real scenarios. More importantly, the system can identify conditions under which a robot might violate safety policies or instructions, providing a new tool for reliability verification. Runway has planned to open this module through an SDK to partner companies and revealed that it is engaging in in-depth discussions with multiple robotics companies.

GWM-Avatars focuses on generating digital humans with realistic human behavioral logic for communication and training scenarios—this direction aligns with D-ID, Synthesia, Soul Machines, and even Google's digital human projects. Although the three branches are currently independent models, Runway clearly stated that the ultimate goal is to integrate them into a unified general world model.

At the same time, Runway has made significant upgrades to its Gen4.5 video generation model released earlier this month. The new version supports native audio generation, multi-shot video synthesis of one minute in length, and maintains character consistency, adds dialogue and ambient sound effects. Users can also edit the audio of existing videos or make fine-tuned adjustments to multi-shot works of any length. These capabilities make Runway's video tools increasingly close to the "integrated video suite" recently launched by Kling, marking the transition of AI video generation from creative prototypes to industrial-level production tools. Currently, the upgraded Gen4.5 is available to all paying users.

As world models move from theory to engineering implementation, Runway is trying to build a bridge connecting virtual simulation and real-world actions using the philosophy of "pixels as physics." Here, AI not only sees and speaks, but also begins to understand how the world operates.