Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

✓AI News
AI Tools

Type :

✓AI News
AI Tools

2025-06-24 10:01:19.AIbase

From Text Generation to Instruction Editing: OmniGen2 Redefines Application Scenarios for Open-Source Multimodal Models

2025-06-05 14:39:14.AIbase

NVIDIA Outperforms! Llama-Nemotron-Nano-VL-8B-V1 Released - All-in-One for Image, Video, and Text, Who Will Challenge the Fine-tuning Throne?

2025-06-05 09:07:53.AIbase

NVIDIA Releases Llama Nemotron Nano VL AI: Tops OCRBench High-Precision Document Processing Solution

NVIDIA officially released Llama Nemotron Nano VL on June 3, 2025. This compact visual-language model (VLM) is specially optimized for document intelligence processing. The model topped the OCRBench v2 benchmark test, showcasing its outstanding capabilities in handling complex documents, charts, and video frames. With efficient inference performance and flexible deployment options, Llama Nemotron Nano VL provides enterprises with solutions from the cloud.

2025-05-30 09:41:42.AIbase

Open Source of Xiaomi Multi-modal Large Model Xiaomi MiMo-VL

Recently, the MiMo-VL multi-modal model developed by Xiaomi Company has taken over the baton from MiMo-7B, showcasing powerful capabilities in multiple fields. The model significantly outperforms the same-size benchmark multi-modal model Qwen2.5-VL-7B in tasks such as general question answering and understanding inference for images, videos, and language, and its performance in GUI Grounding tasks is comparable to that of specialized models, preparing for the arrival of the Agent era.

2025-05-22 17:44:35.AIbase

ByteDance releases 14B parameter multi-modal powerhouse BAGEL, outperforms Qwen2.5-VL in image generation媲美SD3

2025-05-14 14:46:22.AIbase

Only 20B parameters! ByteDance releases Seed1.5-VL multimodal model, achieving SOTA in 38 tasks

2025-04-16 13:54:40.AIbase

National Supercomputing Internet Platform Launches MiniMax Domestic AI Large Model, Boosting AI Open-Source Ecosystem and Intelligent Interaction

China's AI industry is accelerating its journey to the global stage. AIbase learned from social media that the National Supercomputing Internet Platform has officially launched MiniMax's domestic AI large models, including MiniMax-Text-01 and MiniMax-VL-01, and they have joined the supercomputing internet AI open-source community. Simultaneously, MiniMax's ChatBot dialogue service has also been integrated into the platform, providing users with a highly efficient intelligent interaction experience. The following is AIbase's in-depth analysis of this significant development.

2025-04-16 10:51:03.AIbase

National Supercomputing Platform Releases New Generation Multimodal Large Model to Advance AI Agent Development

2025-04-14 09:25:20.AIbase

Kimi-VL y Kimi-VL-Thinking, modelos de lenguaje visual de código abierto, superan a GPT-4o en varios benchmarks

Los modelos de lenguaje visual de código abierto Kimi-VL y Kimi-VL-Thinking han superado a GPT-4o en varias pruebas de referencia. Estos modelos representan un avance significativo en el campo de la inteligencia artificial, combinando la capacidad de procesamiento del lenguaje natural con la comprensión de imágenes.

2025-04-10 09:13:50.AIbase

From Text to Complex Characters: The OmniSVG, the Most Powerful SVG Generation Model, Has Arrived!

On April 9th, 2025, a powerful SVG (Scalable Vector Graphics) generation model named OmniSVG was officially unveiled, marking a new stage in vector graphic generation technology. Jointly developed by StepFun and Fudan University, this model is hailed as the most advanced SVG generation model currently available. Its outstanding multi-modal generation capabilities and efficient performance have attracted widespread attention. OmniSVG's technological breakthrough is based on a pre-trained Vision-Language Model (VLM)...

2025-03-25 10:03:35.AIbase

Alibaba Unveils Qwen2.5-VL-32B: A New Multimodal Model Combining Vision, Language, and Mathematical Reasoning

Alibaba is making waves in the AI field with the recent open-source release of its latest multimodal model, Qwen2.5-VL-32B-Instruct. This model is part of the Qwen2.5 series, which also includes 3B, 7B, and 72B versions. The 32B version prioritizes convenient local execution while maintaining performance. Enhanced through reinforcement learning, Qwen2.5-VL-32B excels in several areas. Notably, its responses are more aligned with human expectations.

2025-03-07 11:46:52.AIbase

Baidu Research Releases BGE-VL Multimodal Vector Model, Ushering in a New Era of Retrieval

Baidu Research has unveiled BGE-VL, a groundbreaking multimodal vector model poised to revolutionize information retrieval. This advanced model promises significant improvements in search accuracy and efficiency.

2025-03-06 14:46:43.AIbase

New Breakthrough in Multimodal Retrieval! Beijing Academy of Artificial Intelligence Open-Sources Multimodal Vector Model BGE-VL

On March 6th, the Beijing Academy of Artificial Intelligence (BAAI) announced the open-sourcing of its multimodal vector model, BGE-VL. This achievement marks a significant breakthrough in the field of multimodal retrieval. The BGE-VL model achieves state-of-the-art results on various multimodal retrieval tasks, including image-text retrieval and compositional image retrieval, significantly improving the performance of multimodal retrieval.

2025-01-29 10:15:10.AIbase

Alibaba Cloud Tongyi Released Qwen2.5-VL, Visual AI Surpassing Claude 3.5

Alibaba Cloud Tongyi has open-sourced a new visual model Qwen2.5-VL, launching three sizes: 3B, 7B, and 72B. The flagship model Qwen2.5-VL-72B won the championship in visual understanding across 13 authoritative evaluations, surpassing GPT-4o and Claude 3.5. According to Alibaba Cloud's official introduction, the new Qwen2.5-VL can analyze image content more accurately and support breakthrough video understanding of over 1 hour. This model can search for specific events in videos, and provide insights on...

2024-12-31 15:35:18.AIbase

Aliyun Cuts Prices Again: Qwen-VL Large Model Fully Reduced, Process 600 Images for 1 Yuan

2024-12-25 13:56:02.AIbase

Ali Releases Multimodal Inference Model QVQ-72B! Enhanced Visual and Language Capabilities, Solving Complex Problems with Ease

2024-12-25 07:59:08.AIbase

Alibaba Tongyi Qwen Open Source Visual Reasoning Model QVQ-72B-Preview

The Qwen team recently announced the open-source release of their latest multimodal reasoning model, QVQ, marking an important step forward in the capabilities of artificial intelligence in visual understanding and complex problem-solving. This model is based on Qwen2-VL-72B and aims to enhance AI reasoning capabilities by combining language and visual information. In the MMMU evaluation, QVQ achieved a high score of 70.3, demonstrating significant performance improvements over Qwen2-VL-72B-Instruct in several math-related benchmark tests.

2024-12-21 10:07:30.AIbase

Easily Build Multi-Modal AI Applications! Alibaba Cloud's Bairen Model Service Platform Launches 'Real-Time Audio and Video Interaction' Feature

The Alibaba Cloud Bairen Model Service Platform recently launched the 'Real-Time Audio and Video Interaction' feature, enabling users to easily build multi-modal AI applications without the need for programming knowledge. This new feature allows users to quickly integrate AI models into Web, iOS, and Android applications and share them with others. Users can create intelligent agent applications through simple steps: first, create a new intelligent agent application, then select and configure the required text, speech, or visual understanding models on the Alibaba Cloud Bairen platform. The platform offers over 200 large models, including

2024-12-16 14:16:21.AIbase

DeepSeek-AI Open Source DeepSeek-VL2 Series: 3B, 16B, and 27B Parameter Models

With the rapid development of artificial intelligence, the integration of visual and language capabilities has led to groundbreaking advancements in visual language models (VLMs). These models aim to simultaneously process and understand both visual and textual data, being widely applied in scenarios such as image captioning, visual question answering, optical character recognition, and multimodal content analysis. VLMs play a significant role in developing autonomous systems, enhancing human-computer interaction, and creating efficient document processing tools, successfully bridging the gap between these two data modalities. However, there are challenges in handling high-resolution visual data and diverse textual input.

2024-11-27 15:56:10.AIbase

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

From Text Generation to Instruction Editing: OmniGen2 Redefines Application Scenarios for Open-Source Multimodal Models

NVIDIA Outperforms! Llama-Nemotron-Nano-VL-8B-V1 Released - All-in-One for Image, Video, and Text, Who Will Challenge the Fine-tuning Throne?

NVIDIA Releases Llama Nemotron Nano VL AI: Tops OCRBench High-Precision Document Processing Solution

Open Source of Xiaomi Multi-modal Large Model Xiaomi MiMo-VL

ByteDance releases 14B parameter multi-modal powerhouse BAGEL, outperforms Qwen2.5-VL in image generation媲美SD3

Only 20B parameters! ByteDance releases Seed1.5-VL multimodal model, achieving SOTA in 38 tasks

National Supercomputing Internet Platform Launches MiniMax Domestic AI Large Model, Boosting AI Open-Source Ecosystem and Intelligent Interaction

National Supercomputing Platform Releases New Generation Multimodal Large Model to Advance AI Agent Development

Kimi-VL y Kimi-VL-Thinking, modelos de lenguaje visual de código abierto, superan a GPT-4o en varios benchmarks

From Text to Complex Characters: The OmniSVG, the Most Powerful SVG Generation Model, Has Arrived!

Alibaba Unveils Qwen2.5-VL-32B: A New Multimodal Model Combining Vision, Language, and Mathematical Reasoning

Baidu Research Releases BGE-VL Multimodal Vector Model, Ushering in a New Era of Retrieval

New Breakthrough in Multimodal Retrieval! Beijing Academy of Artificial Intelligence Open-Sources Multimodal Vector Model BGE-VL

Alibaba Cloud Tongyi Released Qwen2.5-VL, Visual AI Surpassing Claude 3.5

Aliyun Cuts Prices Again: Qwen-VL Large Model Fully Reduced, Process 600 Images for 1 Yuan

Ali Releases Multimodal Inference Model QVQ-72B! Enhanced Visual and Language Capabilities, Solving Complex Problems with Ease

Alibaba Tongyi Qwen Open Source Visual Reasoning Model QVQ-72B-Preview

Easily Build Multi-Modal AI Applications! Alibaba Cloud's Bairen Model Service Platform Launches 'Real-Time Audio and Video Interaction' Feature

DeepSeek-AI Open Source DeepSeek-VL2 Series: 3B, 16B, and 27B Parameter Models

Hugging Face Launches 2B Parameter Visual Language Model SmolVLM: Runs Quickly on Ordinary Devices