NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher
NVIDIA released the multimodal understanding model OmniVinci, which outperformed top models by 19.05 points in benchmark tests. The model achieves excellent performance with only 1/6 of the training data. It aims to enable AI systems to simultaneously understand vision, audio, and text, simulating human multisensory perception of the world.