Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

Shanghai AI Lab Releases the Multimodal Large Model Shuengwan InternVL3.5

AIbase基地

Published inAI News · 5 min read · Sep 1, 2025

On August 31, the Shanghai Artificial Intelligence Laboratory (Shanghai AI Lab) announced the open-source release of the multimodal large model InternVL3.5, known as Shu Shen · Wanxiang. The model achieves a comprehensive upgrade in reasoning ability, deployment efficiency, and general capabilities through innovative cascade reinforcement learning (Cascade RL), dynamic visual resolution routing, and decoupled deployment architecture. InternVL3.5 has released full-scale versions from 1B to 241B parameters, setting a new benchmark for open-source model performance and achieving leading levels in various tasks.

The flagship model of InternVL3.5, InternVL3.5-241B-A28B, achieved the highest score of 77.7 points for open-source models on the multidisciplinary reasoning benchmark MMMU. It scored 77.9 points on the multimodal general perception benchmark MMStar and 90.7 points on OCRBench, surpassing GPT-5 (75.7 points/80.7 points). On the text reasoning benchmarks AIME25 and MMLU-Pro, it reached 75.6 and 81.3 points respectively, comprehensively outperforming existing open-source multimodal large models. Thanks to the cascade reinforcement learning framework, the overall reasoning performance of the entire series of models improved by an average of 16.0 points compared to the previous generation. Among them, the comprehensive reasoning performance of InternVL3.5-241B-A28B reached 66.9 points, exceeding the previous generation model's 54.6 points and Claude-3.7-Sonnet's 53.9 points, showing outstanding performance in complex tasks such as mathematical reasoning and logical reasoning.

WeChat screenshot_20250901092244.png

With the innovative visual resolution routing (ViR) and decoupled deployment framework (DvD), the 38B model significantly improved response speed at 896 resolution, with single inference latency reduced from 369ms to 91ms (an improvement of about 4 times). At the same time, the lightweight InternVL3.5-Flash maintained nearly 100% performance level while reducing the visual sequence length by 50%.

InternVL3.5 also enhanced core capabilities of intelligent agents, including GUI agents, embodied agents, SVG graphic understanding and generation. It surpassed mainstream open-source models on tasks such as ScreenSpot GUI positioning (92.9 points), VSI-Bench spatial reasoning (69.5 points), and SGP-Bench vector graphics understanding (70.6 points).

InternVL3.5 provides nine model sizes ranging from 1 billion to 241 billion parameters, covering different resource demand scenarios, including dense models and mixture-of-experts models (MoE). It is the first open-source multimodal large model that supports the GPT-OSS language model base. The official provides an example code for running the InternVL3.5-8B using `transformers`. The model can be deployed on a single A100 GPU, while the 38B model requires 2 A100 GPUs, and the 235B model requires 8 A100 GPUs.

ms-swift already supports training for the InternVL3.5 series models. ms-swift is a large model and multimodal large model training and deployment framework provided by the ModelScope community. Users can prepare data in a specific format for custom dataset fine-tuning. After training, they can perform inference using the corresponding command and push the model to ModelScope.

The release of InternVL3.5 marks another important advancement in multimodal large model technology, providing researchers and developers with powerful tools and promoting the development of multimodal artificial intelligence.

Code open source / model usage method:

https://github.com/OpenGVLab/InternVL

Model collection:

https://www.modelscope.cn/collections/InternVL35-Full-3871e58bf21349

Online experience:

https://chat.intern-ai.org.cn/

MultimodalLargeModel HierarchicalReinforcementLearning InternVL3.5 ShanghaiArtificialIntelligenceLaboratory

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Meituan Releases Meeseeks Evaluation Benchmark! o3-mini Leads, DeepSeek-R1 Surprisingly Lasts, Sparks Discussion

Meituan's M17 team introduced Meeseeks benchmark to evaluate LLMs' instruction-following ability, addressing issues where outputs fail to meet specific format/content requirements.....

Aug 29, 2025

340

X-SAM: Breaking the Boundaries of Image Segmentation Achieving a New Breakthrough in Arbitrary Segmentation

X-SAM, a multimodal model for image segmentation, advances from 'segment anything' to 'segment everything', enhancing precision and flexibility via unified I/O formats supporting text/visual queries.....

Aug 19, 2025

120

Xiaomi Unveils Another AI Star! Open-Source Multimodal Large Model MiMo-VL-7B-2508 with Significant Performance Improvement, Supports Thinking Mode Switching

Xiaomi open-sources MiMo-VL-7B-2508, a multimodal model with SFT/RL versions. Features 'thinking mode' switching, improved RL stability, and leading scores in MMMU/ChartQA benchmarks.....

Aug 12, 2025

140

Xiaomi Opensources Latest Multimodal Large Model Xiaomi MiMo-VL-7B-2508

The Xiaomi large model team announced the open source of the latest multimodal large model Xiaomi MiMo-VL-7B-2508, which includes two versions: RL and SFT. Official data shows that the new model has set new records in four core capabilities: subject reasoning, document understanding, graphical interface positioning, and video understanding. Among them, the MMMU benchmark has broken through the 70-point mark for the first time, ChartQA has risen to 94.4, ScreenSpot-v2 has reached 92.5, and VideoMME has improved to 70.8.

Aug 9, 2025

460

Xiaohongshu Launches Open-Source Multimodal Large Model dots.vlm1, Leading the Industry with NaViT Vision Encoder

Aug 7, 2025

270

MiniCPM-V4.0 Open Source Release, Considered GPT-4V on Mobile Devices

Aug 7, 2025

210

AI Daily: Alibaba Launches New Image Model Qwen-Image; Zread.ai Powered by GLM-4.5; Claude Opus 4.1 May Start Internal Testing

AI highlights: 1. Alibaba's Qwen-Image excels in Chinese text rendering. 2. ChatGPT hits 700M users, OpenAI earns $12B. 3. Anthropic tests Claude 4.1. 4. Zread.ai by Zhipu. 5. xAI's Grok Imagine4 for text-to-video. 6. Character.AI's social features. 7. Alibaba & Nankai's LLaVA-Scissor. 8. Beijing's humanoid robot vision. 9. 8 AI models in Kaggle chess. 10. OpenMind's OM1 OS.....

Aug 5, 2025

300

Xiaomi Fully Open-Sources MiDashengLM-7B: Audio Understanding Performance Sets SOTA, Inference Speed Increases by 20 Times

Xiaomi releases MiDashengLM-7B, a multimodal model with breakthroughs in audio understanding. It features dual-core architecture, achieves SOTA on 22 benchmarks, offers 4x faster inference and 20x higher throughput. Supports unified processing of speech/environmental sounds/music, with offline deployment. Fully open-sourced.....

Aug 4, 2025

280

Ali WebShaper Released! GAIA Outperforms Claude 3.5 Sonnet and GPT-4o

Tongyi Lab of Alibaba released the open-source tool WebShaper, adopting an innovative formal-driven information retrieval paradigm. It achieved a score of 60.19 on the GAIA benchmark, surpassing Claude 3.5 Sonnet and GPT-4o. The framework ensures consistency between knowledge structure and reasoning logic through structured data generation methods, significantly enhancing AI's ability to handle complex tasks. As the fourth tool in the WebAgent series, WebShaper has received over 4,000 stars on GitHub and is driving the development of the open-source AI community.

Jul 31, 2025

150

Microsoft releases the innovative small-parameter model Mu: Performance comparable to Phi-3.5-mini, empowering Windows agents

This morning, Microsoft officially announced its latest innovative small-parameter model, Mu. This model has only 330 million parameters, yet it can match the performance of Microsoft's previously released Phi-3.5-mini, while being just one-tenth the size of Phi-3.5-mini. More notably, Mu can achieve a response speed of over 100 tokens per second on offline NPU laptops, which is a rare breakthrough in the field of small-parameter models. A major highlight of the Mu model is its support for setting intelligent features in Windows.

Jun 24, 2025

300

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Shanghai AI Lab Releases the Multimodal Large Model Shuengwan InternVL3.5

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Meituan Releases Meeseeks Evaluation Benchmark! o3-mini Leads, DeepSeek-R1 Surprisingly Lasts, Sparks Discussion

X-SAM: Breaking the Boundaries of Image Segmentation Achieving a New Breakthrough in Arbitrary Segmentation

Xiaomi Unveils Another AI Star! Open-Source Multimodal Large Model MiMo-VL-7B-2508 with Significant Performance Improvement, Supports Thinking Mode Switching

Xiaomi Opensources Latest Multimodal Large Model Xiaomi MiMo-VL-7B-2508

Xiaohongshu Launches Open-Source Multimodal Large Model dots.vlm1, Leading the Industry with NaViT Vision Encoder

MiniCPM-V4.0 Open Source Release, Considered GPT-4V on Mobile Devices

AI Daily: Alibaba Launches New Image Model Qwen-Image; Zread.ai Powered by GLM-4.5; Claude Opus 4.1 May Start Internal Testing

Xiaomi Fully Open-Sources MiDashengLM-7B: Audio Understanding Performance Sets SOTA, Inference Speed Increases by 20 Times

Ali WebShaper Released! GAIA Outperforms Claude 3.5 Sonnet and GPT-4o

Microsoft releases the innovative small-parameter model Mu: Performance comparable to Phi-3.5-mini, empowering Windows agents

GEO Services