NVIDIA Launches New Multimodal Model, Intelligent Agent Efficiency Increased Ninefold
Nvidia unveils the open multimodal model Nemotron 3 Nano Omni, integrating video, audio, image, and text reasoning. It uses a 30B-A3B mixture-of-experts architecture with built-in vision and audio encoders, eliminating extra perception models. This enhances large-scale inference efficiency and excels in complex text processing.....