Google has recently launched the video generation model Veo 3.1, an upgraded version of Veo 3 released in May this year. The new version has improved in audio output, granularity of editing control, and image-to-video quality, enabling the generation of more realistic video clips and more accurately following user instructions.

In terms of functionality, Veo 3.1 allows users to add new objects to videos, and the system automatically integrates them into the existing visual style. Google also revealed that it will soon support removing existing objects from videos in its video editing tool Flow, further enhancing editing flexibility.

image.png

Veo 3 previously offered multiple editing features, including generating characters based on reference images, generating the middle content of a video from the first and last frames by AI, and expanding existing videos based on the last frame. The core upgrade of Veo 3.1 is adding audio generation capabilities to all these editing functions, giving the output video clips sound elements and improving the completeness and immersion of the content.

From the deployment perspective, Veo 3.1 will be available to users through multiple platforms. Google is integrating the model into the video editor Flow, the Gemini app, and the Vertex AI and Gemini API interfaces for developers. According to data disclosed by Google, over 275 million videos have been created on the Flow platform since its launch in May.

This update reflects the evolution of AI video generation technology in two directions. One is the continuous improvement of generation quality—more realistic visuals and more accurate understanding of user prompts. The other is the refinement of editing capabilities—from overall generation to local modifications and fine operations such as adding or removing objects. The addition of audio generation fills a common shortcoming of AI video tools, which previously lacked sound elements.

However, from the perspective of technical maturity, AI video generation is still in a phase of rapid iteration. The coherence of videos, the accuracy of physical laws, and the ability to handle complex scenes are continuously being improved by various models. The actual performance of Veo 3.1, including the synchronization quality of audio and video, the naturalness of object integration, and other details, still needs to be verified through user experience.