Die AI-Team von Alibaba International Digital Trade Group veröffentlicht Ovis2.5: Neuer Durchbruch im wirtschaftlichen visuellen Schließen

AIbase基地

Published inAI News · 3 min read · Aug 18, 2025

Das AI-Team (AIDC-AI) der Alibaba International Digital Trade Group hat kürzlich ein neues multimodales großes Sprachmodell Ovis2.5 veröffentlicht, das zwei Versionen mit 9B und 2B Parametern bietet. Das Modell ist als wirtschaftliche visuelle Schlussfolgerungslösung konzipiert und zeigt in seinem Größenbereich herausragende Leistungsfähigkeit, was es zu einem neuen Benchmark für multimodale KI-Anwendungen macht.

Die Kernmerkmale von Ovis2.5

1. **Native Auflösungserkennung**: Ovis2.5 verwendet den NaViT-Visual-Encoder, der die feinen Details und die globale Struktur eines Bildes ohne Verluste beibehält, um eine hochwertige visuelle Verarbeitungsfähigkeit sicherzustellen.

2. **Tiefe Schlussfolgerungsfähigkeit**: Das Modell unterstützt einen „Denkmodus“, der möglicherweise Teile der technischen Merkmale von Alis Qwen3 nutzt. Neben der linearen Denkketten (CoT)-Schlussfolgerung kann sich Ovis2.5 auch selbst überprüfen und korrigieren und unterstützt ein konfigurierbares Denkbudget, um die Genauigkeit bei der Problemlösung zu verbessern.

3. **Führend in Diagramm- und Dokumenten-OCR**: Auf den Größen 9B und 2B erreicht Ovis2.5 in komplexer Diagrammanalyse, Dokumentenverständnis (einschließlich Tabellen und Formulare) sowie optischer Zeichenerkennung (OCR) führende Positionen im Branchenvergleich und bietet damit starken Support für reale Anwendungsszenarien.

4. **Weite Aufgabenabdeckung**: Das Modell schneidet gut in Bildschlussfolgerung, Videoverstehen und visuellen Lokalisierungsbenchmark-Tests ab und zeigt damit eine starke allgemeine multimodale Fähigkeit.

Die Veröffentlichung von Ovis2.5 unterstreicht die kontinuierliche Innovation von AIDC-AI im Bereich multimodaler KI-Technologie. Durch die Erreichung einer hohen Leistung in einem kompakten Modellgrößenbereich bietet Ovis2.5 Entwicklern und Unternehmen eine effiziente und leicht implementierbare Lösung, insbesondere für Szenarien, die eine Kombination aus visueller und textbasierter Schlussfolgerung erfordern. Das Modell ist auf Plattformen wie GitHub und Hugging Face open source, was die Zusammenarbeit und Innovation in der globalen KI-Gemeinschaft weiter voranbringt.

Diese Veröffentlichung ist ein weiterer wichtiger Fortschritt des AIDC-AI innerhalb der Ovis-Serie und gibt der Entwicklung multimodaler großer Sprachmodelle neue Impulse.

Ovis2.5 AI-Team Multimodales großes Sprachmodell AIDC-AI

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Cracking All AI Watermarks in 5 Minutes! Mysterious Tool UnMarker Makes Google's Protection System Useless

The field of AI image watermarks has recently faced a major blow. A new tool called UnMarker claims to be able to crack almost all AI image watermarks on the market within just 5 minutes, including Google's highly praised HiDDeN watermark technology, which has been completely broken, and even the SynthID watermark system, considered more secure, faces a high attack success rate of 79%. This technological breakthrough has caused a big stir in the tech world, prompting the industry to re-evaluate the security and reliability of existing watermark technologies. Traditional watermarks have long been seen as a way to protect creators' intellectual property.

Aug 18, 2025

Baidu launches the world's first all-end general AI agent GenFlow2.0, with generation speed 10 times faster than similar products

At Baidu AI Day, Baidu Wenku jointly with Baidu Netdisk officially launched the world's first all-end general AI agent GenFlow2.0, marking a new milestone in AI agent technology. According to official information, GenFlow2.0 demonstrates strong parallel processing capabilities, supporting over 100 expert AI agents working simultaneously, and can complete more than 5 complex tasks in parallel within 3 minutes. The product's generation speed is 10 times faster than mainstream similar products, achieving a breakthrough in minute-level delivery in the industry. GenFlow

Aug 18, 2025

AI Daily: Tencent Launches Audio Generation Tool AudioGenie; Alibaba Launches Intelligent Agent WebWatcher; China's First Legal Vertical Large Model, Xiao Bao Gong, is Released

AI Daily brings you the latest in AI, focusing on trends and innovative applications. Highlight: Tencent's AudioGenie revolutionizes audio generation with multi-modal inputs and no-training framework, showcasing China's AI prowess.....

Aug 18, 2025

Sam Altman, CEO of OpenAI: Reservations About Continuing as CEO After the Company Goes Public

AIbase report According to a report by timesofindia citing Bloomberg, Sam Altman, CEO of OpenAI, revealed in a recent interview that despite the company's ambitious plans, including investing tens of billions of dollars to build computing infrastructure, he expressed reservations about continuing to serve as CEO after the company goes public. Altman admitted that although he is currently overseeing several important projects at OpenAI, he believes his skills may not align with the requirements of a public company CEO.

Aug 18, 2025

ChatGPT Mobile App Revenue Exceeds $2 Billion, Competitors Far Behind

According to the latest analysis from app intelligence provider Appfigures, since its launch in May 2023, the ChatGPT mobile app by OpenAI has reached an impressive $2 billion in global consumer spending. This revenue figure is 30 times the total lifetime spending of its competitors such as Claude, Copilot, and Grok on mobile. Image source note: The image is AI-generated, and the image licensing service provider is Midjourney during this period in 2025.

Aug 18, 2025

IDC Report: The Chinese AI Public Cloud Service Market Surged in 2024, Alibaba Cloud Remains the Top in the Chinese Market

IDC reports China's AI public cloud market to hit RMB 19.59B in 2024, up 55.3% YoY, driven by GenAI and ML demand. Computer vision leads with RMB 8.1B (33.7% growth), Tencent Cloud and Baidu AI Cloud leading.....

Aug 18, 2025

Tencent AudioGenie Makes a Stunning Debut! One-Click Generation of Movie-Level Sound Effects, Claude and Gemini Tremble in Fear!

With the rapid development of artificial intelligence technology, the audio generation field has welcomed a heavy-hitter - AudioGenie developed by Tencent AI Lab. This innovative multimodal audio generation tool, with its natural and appropriate generation effects, strong context understanding capabilities, and the feature of not requiring training, is reshaping the global AI audio market landscape. Multimodal input, comprehensive audio output. AudioGenie supports multiple modal inputs such as video, text, and images, and can generate sound effects, speech, music, and mixed audio outputs. Regardless

Aug 18, 2025

Ant AI Health Assistant AQ Launches Four Anti-False Advertising Features, Waging War Against False Medical Ads

Today, Ant Group officially announced the launch of a special campaign targeting false medical advertisements. Its AI health assistant AQ launched a series of innovative features including AI photo-based fraud detection, AI call verification, and 24-hour rolling fact-checking, building a full-chain medical information anti-counterfeiting system. Users only need to input a fraud detection command in the AQ App to activate four core functions: identifying the authenticity of pharmaceutical and health product ads around them through photo recognition, verifying health rumors in real-time via AI calls, obtaining continuously updated fact-checking information, and relying on the platform's collaboration with 300,000 real licensed practitioners.

Aug 18, 2025

AI Technology Is Abused as a Refund Tool, Merchants Are Helpless: Fake Images Are Too Real, They Have No Way to Complain

E-commerce faces new AI fraud: buyers use AI to fake product damage for refunds, causing losses. Fake images fool platforms, sparking concerns over AI misuse.....

Aug 18, 2025

Ant Group Open-Sources 1.8 Million Deepfake Localization Datasets to the World, Aiding AI Algorithm Interpretability

Recently, during the International Joint Conference on Artificial Intelligence (IJCAI) held in Montreal, Canada, Ant Group and the Agency for Science, Technology and Research (A*STAR) of Singapore jointly hosted a workshop on "Deepfake Detection, Localization, and Interpretability." At the event, Ant Group and Stanford University jointly open-sourced two deepfake datasets, covering human face and action forgery, voice cloning, and other modalities, providing key foundational data resources for the industry and promoting the development of AI security technologies. The workshop focused on the direction of Deepfake (deepfake) recognition.

Aug 18, 2025

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Die AI-Team von Alibaba International Digital Trade Group veröffentlicht Ovis2.5: Neuer Durchbruch im wirtschaftlichen visuellen Schließen

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Cracking All AI Watermarks in 5 Minutes! Mysterious Tool UnMarker Makes Google's Protection System Useless

Baidu launches the world's first all-end general AI agent GenFlow2.0, with generation speed 10 times faster than similar products

AI Daily: Tencent Launches Audio Generation Tool AudioGenie; Alibaba Launches Intelligent Agent WebWatcher; China's First Legal Vertical Large Model, Xiao Bao Gong, is Released

Sam Altman, CEO of OpenAI: Reservations About Continuing as CEO After the Company Goes Public

ChatGPT Mobile App Revenue Exceeds $2 Billion, Competitors Far Behind

IDC Report: The Chinese AI Public Cloud Service Market Surged in 2024, Alibaba Cloud Remains the Top in the Chinese Market

Tencent AudioGenie Makes a Stunning Debut! One-Click Generation of Movie-Level Sound Effects, Claude and Gemini Tremble in Fear!

Ant AI Health Assistant AQ Launches Four Anti-False Advertising Features, Waging War Against False Medical Ads

AI Technology Is Abused as a Refund Tool, Merchants Are Helpless: Fake Images Are Too Real, They Have No Way to Complain

Ant Group Open-Sources 1.8 Million Deepfake Localization Datasets to the World, Aiding AI Algorithm Interpretability