Ali International Launches Multimodal Large Model Ovis2.5, Driving New Advances in Visual Perception and Deep Reasoning

AIbase基地

Published inAI News · 5 min read · Aug 26, 2025

Recently, Alibaba International officially released its next-generation multimodal large model Ovis2.5 and open-sourced it to the public. This model focuses on native resolution visual perception, deep reasoning, and cost-effective scenario design, aiming to further enhance the application capabilities of artificial intelligence. The comprehensive score of Ovis2.5 on the mainstream multimodal evaluation suite OpenCompass has significantly improved compared to the previous version Ovis2, continuing to maintain SOTA (State-of-the-Art) level among similar open-source models.

The released Ovis2.5 includes two versions with different parameter scales. First is Ovis2.5-9B, which achieved a high score of 78.3 in the OpenCompass evaluation, surpassing many models with larger parameter counts and ranking first among open-source models with less than 40B parameters. Second, Ovis2.5-2B has a comprehensive score of 73.9, continuing the Ovis series' concept of "small size, big power," making it especially suitable for edge-side and resource-constrained application scenarios.

In terms of the overall architecture of Ovis2.5, the official stated that systematic innovations were implemented, mainly reflected in three aspects: model architecture, training strategy, and data engineering. In terms of model architecture, Ovis2.5 continues the series' innovative structured embedding alignment design, consisting of three core components: dynamic resolution visual feature extraction, visual vocabulary modules to achieve structural alignment between vision and text, and powerful language processing capabilities based on Qwen3.

In terms of training strategy, Ovis2.5 adopts a more refined five-stage training plan, including basic visual pre-training, multimodal pre-training, and large-scale instruction fine-tuning, among other steps. At the same time, algorithms such as DPO and GRPO are used to strengthen preference alignment and reasoning capabilities, effectively improving the model's performance. Additionally, the model's training speed has achieved a 3 to 4 times end-to-end acceleration.

In terms of data engineering, the amount of data in Ovis2.5 has increased by 50% compared to Ovis, focusing on key areas such as visual reasoning, charts, OCR (Optical Character Recognition), and Grounding. In particular, a large amount of "thinking" data deeply adapted to Qwen3 was synthesized, greatly stimulating the model's reflective and reasoning potential.

The code and models of Ovis2.5 are now available on platforms such as GitHub and Hugging Face, and users can access relevant resources through these platforms to further explore their application potential.

Code: https://github.com/AIDC-AI/Ovis

Model: https://huggingface.co/AIDC-AI/

Key Points:
🌟 Ovis2.5 achieved a comprehensive score of 78.3 in the OpenCompass evaluation, maintaining the SOTA level.
🔧 It includes two versions: Ovis2.5-9B is suitable for large-scale applications, while Ovis2.5-2B focuses on resource-constrained scenarios.
📊 It adopts an innovative architecture and training strategy, with a 50% increase in data volume, focusing on key areas such as visual reasoning.

Google Launches Imagen 4 Text-to-Image Model, Introduces Three Versions to Meet Different Needs

Google Inc. has officially launched the new text-to-image generation model Imagen 4, which is now available to users through the Gemini API and Google AI Studio platform. According to official information, the new version has achieved significant improvements in text rendering performance compared to previous products. This release includes three different versions of the model. The standard version of Imagen 4 mainly enhances the overall image generation quality, particularly excelling in text rendering accuracy. For fast generation needs, Google has also introduced I

AI Grandma's Superstition Health Video Becomes Popular on Xiaohongshu, One Video Gets 81,000 Likes

Monetization strategy: Use AI-generated 'grandma' character on Xiaohongshu to post mystical/wellness shorts, like 'health benefits of jewelry', attracting likes for ad revenue. Ideal for creators avoiding real appearances, interested in wellness/mysticism, or using AI for bulk content. Difficulty: Medium (requires AI tools & topic planning). Steps: Define a relatable character (e.g., 'AI grandma') for trustworthiness.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

Building and Deploying AI

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

Ali International Launches Multimodal Large Model Ovis2.5, Driving New Advances in Visual Perception and Deep Reasoning

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI-Generated Cat Short Video Goes Viral on Social Media, Attracting Massive Attention with Hundreds of Millions of Views

Tencent Games Launches AI Toolset VISVISE, Animation Production Efficiency Increased 8 Times

Musk's xAI Sues Apple and OpenAI, Accusing Them of Collusion to Monopolize the AI Market

AI Daily: Wan 2.2-S2V Model to be Released; ByteDance Tests 3D Model Generator; Microsoft Opensources VibeVoice-1.5B Model

Microsoft Opensources VibeVoice TTS Model: 90-Minute Ultra-Long Speech, Can Support 4-Person Dialogue, Chinese Performance is Stunning!

Rise of Domestic Large Models, Intelligent Agents Lead AI into a New Era!

粉笔 Launches New AI Practice Course for Public Institution Examinations: Domain-Specific Large Models Empower Personalized Preparation

NVIDIA Launches New Generation Robot Chip Jetson Thor, AI Computing Power Increased by 7.5 Times!

Google Launches Imagen 4 Text-to-Image Model, Introduces Three Versions to Meet Different Needs

AI Grandma's Superstition Health Video Becomes Popular on Xiaohongshu, One Video Gets 81,000 Likes