NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

AIbase基地

Published inAI News · 4 min read · Oct 28, 2025

The NVIDIA research team has released a multimodal understanding model called OmniVinci, which has achieved remarkable results on key multimodal understanding benchmarks, scoring 19.05 points higher than existing top models. More impressively, OmniVinci used only 1/6 of the training data, demonstrating excellent data efficiency and performance.

OmniVinci aims to create a comprehensive AI system capable of understanding vision, audio, and text simultaneously, enabling machines to perceive and understand complex worlds through multiple senses as humans do. To achieve this goal, the NVIDIA team adopted innovative architectural designs and data management strategies, integrating information from different senses into a unified multimodal latent space, achieving cross-modal understanding and reasoning.

In the Dailyomni benchmark test, OmniVinci outperformed Qwen2.5-Omni, scoring 1.7 points higher in the MMAR audio comprehension test and 3.9 points higher in the Video-MME visual comprehension test. The training token count was only 0.2 trillion, while Qwen2.5-Omni used 1.2 trillion, showing that OmniVinci's training efficiency is six times that of Qwen2.5-Omni.

The core innovation of the model lies in its multimodal alignment mechanism, including three technologies: the OmniAlignNet module, temporal embedding grouping (TEG), and constrained rotation temporal embedding (CRTE). OmniAlignNet leverages the complementarity between visual and audio signals to enhance their learning and alignment. TEG effectively encodes temporal relationships by grouping visual and audio information temporally. CRTE further addresses temporal alignment issues, ensuring the model can understand absolute temporal information of events.

The research team adopted a two-stage training approach, first conducting modality-specific training, followed by full-modal joint training, gradually enhancing the model's full-modal understanding capabilities. In implicit full-modal learning, researchers further improved the model's ability to understand audio-visual content using existing video question-answering datasets.

The release of OmniVinci marks a significant breakthrough for NVIDIA in the field of multimodal AI, and is expected to drive the development of AI technology in various applications, helping to create smarter systems and services. The open-source release of this model will also provide new opportunities for global researchers and developers, promoting further exploration and innovation of AI in practical applications.

Buy a Household Assistant for $20,000? OpenAI-Backed 1X Neo Humanoid Robot Begins Pre-Sales, Entering American Homes Next Year

Norwegian robotics company 1X introduced its first home-use humanoid robot, Neo, priced at $20,000 with a monthly subscription fee of $499. This 5-foot-6-inch robot is designed for household tasks such as washing dishes and organizing, using an AI and human remote collaboration model, requiring external support to complete complex tasks.

Amazon Web Services Plans to Add $5 Billion Investment in South Korea to Promote AI Data Center Construction

Amazon AWS announced that it will add a $5 billion investment in South Korea over the next six years to expand AI data centers, and cooperate with SK Group to build a large facility in Ulsan. The total investment in South Korea will reach $12.6 billion, highlighting its strategic importance of the South Korean market.

The father of DayZ compares current fear of AI to previous panic about Google and Wikipedia

AI technology is developing rapidly, and the gaming industry is undergoing changes. Generative AI brings new opportunities and challenges, and companies such as Microsoft and Amazon are adjusting resources to focus on AI applications. Game developers have different attitudes toward this, and the industry's future remains uncertain.

Qualcomm Enters the Data Center! Launches AI200/AI250 Chips Targeting NVIDIA, Stock Surges 20% in a Day

Qualcomm announced two cloud AI inference chips, AI200 and AI250, planned for commercial use in 2026 and 2027, marking its transition from terminal chips to a full-stack AI infrastructure. The news caused the stock to surge over 20% in a single day, the largest increase since 2019. Unlike NVIDIA's comprehensive approach, Qualcomm focuses on the large model inference market, emphasizing energy efficiency and cost advantages.

Tsinghua University and Kuaishou Unveil a New SVG Diffusion Model with a 6200% Increase in Training Efficiency

Tsinghua University and the Kuaishou Kirin team have collaborated to launch an SVG model that replaces VAE, solving the issue of semantic entanglement, with a 6200% improvement in training efficiency and a 3500% increase in generation speed, marking the gradual phase-out of VAE in the field of image generation.

NVIDIA Launches Revolutionary AI Data Center Design to Enhance High-Performance Computing

At the 2025 GTC conference, NVIDIA introduced the 'Omniverse DSX Blueprint' design, specifically tailored for gigawatt-scale AI data centers, known as the 'AI Factory.' This solution is based on the Omniverse framework and supports various scales from 100 million watts to 1 billion watts. It aims to efficiently train and run large AI models, meeting the growing demand for AI computing, and represents a significant advancement in artificial intelligence infrastructure.

Li Liang, Vice President of Douyin: AI Technology Helps Combat Rumors and Build a Trusted Platform Environment

CCTV reported on the issue of AI-generated fake news. Li Liang, Vice President of Douyin, responded that AI is a double-edged sword: although it can spread rumors easily, Douyin is using AI to combat rumors and develop intelligent agents to quickly search for authoritative information to debunk false claims.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Gemini New Features Launch! Easily Generate Slideshow Presentations

Buy a Household Assistant for $20,000? OpenAI-Backed 1X Neo Humanoid Robot Begins Pre-Sales, Entering American Homes Next Year

Hypermind Launches China's First Interactive AI Podcast, Users Can Ask Questions at Any Time

Amazon Web Services Plans to Add $5 Billion Investment in South Korea to Promote AI Data Center Construction

The father of DayZ compares current fear of AI to previous panic about Google and Wikipedia

AI Daily Report: Douyin Launches an Automatic Multi-Person Voice Synthesis System; Adobe Firefly Image 5 Gets a Major Upgrade; Soul Releases the SoulX-Podcast Voice Model

Qualcomm Enters the Data Center! Launches AI200/AI250 Chips Targeting NVIDIA, Stock Surges 20% in a Day

Tsinghua University and Kuaishou Unveil a New SVG Diffusion Model with a 6200% Increase in Training Efficiency

NVIDIA Launches Revolutionary AI Data Center Design to Enhance High-Performance Computing

Li Liang, Vice President of Douyin: AI Technology Helps Combat Rumors and Build a Trusted Platform Environment

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Gemini New Features Launch! Easily Generate Slideshow Presentations

Buy a Household Assistant for $20,000? OpenAI-Backed 1X Neo Humanoid Robot Begins Pre-Sales, Entering American Homes Next Year

Hypermind Launches China's First Interactive AI Podcast, Users Can Ask Questions at Any Time

Amazon Web Services Plans to Add $5 Billion Investment in South Korea to Promote AI Data Center Construction

The father of DayZ compares current fear of AI to previous panic about Google and Wikipedia

AI Daily Report: Douyin Launches an Automatic Multi-Person Voice Synthesis System; Adobe Firefly Image 5 Gets a Major Upgrade; Soul Releases the SoulX-Podcast Voice Model

Qualcomm Enters the Data Center! Launches AI200/AI250 Chips Targeting NVIDIA, Stock Surges 20% in a Day

Tsinghua University and Kuaishou Unveil a New SVG Diffusion Model with a 6200% Increase in Training Efficiency

NVIDIA Launches Revolutionary AI Data Center Design to Enhance High-Performance Computing

Li Liang, Vice President of Douyin: AI Technology Helps Combat Rumors and Build a Trusted Platform Environment

GEO Services