Xiaohongshu Launches Open-Source Multimodal Large Model dots.vlm1, Leading the Industry with NaViT Vision Encoder

AIbase基地

Published inAI News · 4 min read · Aug 7, 2025

Xiaohongshu Hi Lab has recently released and open-sourced its first self-developed multimodal large model dots.vlm1. This model is based on the 1.2 billion parameter NaViT visual encoder and the DeepSeek V3 large language model, and was fully trained from scratch. Its outstanding performance in multimodal visual understanding and reasoning is now close to the leading closed-source models, such as Gemini2.5Pro and Seed-VL1.5, marking a new height in the performance of open-source multimodal models.

Xiaohongshu

Innovation in Self-Development, Leading Performance

The core highlight of dots.vlm1 lies in its native self-developed NaViT visual encoder. Unlike traditional methods that fine-tune mature models, NaViT is trained from scratch and supports dynamic resolution, better adapting to diverse real-world image scenarios. The model also enhances its generalization ability by combining pure visual and text-visual dual supervision, especially excelling in handling non-traditional structured images such as tables, charts, formulas, and documents.

In terms of data, the Hi Lab team has built a large and well-cleaned training dataset. They improved the quality of image-text alignment by rewriting web data themselves and using their self-developed dots.ocr tool to process PDF documents, laying a solid foundation for the model's cross-modal understanding capabilities.

Evaluation Performance, Comparable to Top Closed-Source Models

On mainstream international multimodal evaluation sets, dots.vlm1's overall performance is impressive. It has reached levels comparable to Gemini2.5Pro and Seed-VL1.5 in several benchmarks such as MMMU, MathVision, and OCR Reasoning. In complex chart reasoning, STEM mathematical reasoning, and long-tail specific scenario recognition, dots.vlm1 demonstrates excellent logical reasoning and analytical capabilities, fully capable of handling high-difficulty tasks like Olympiad math.

Although it still lags behind SOTA closed-source models in extremely complex text reasoning tasks, its general mathematical reasoning and coding capabilities are already on par with mainstream large language models.

The Hi Lab team stated that they will continue to optimize the model. They plan to expand the scale of cross-modal data and introduce advanced algorithms such as reinforcement learning to further improve the model's reasoning generalization capability. By open-sourcing dots.vlm1, Xiaohongshu is committed to bringing new momentum to the multimodal large model ecosystem and promoting industry development.

AI Daily: Alibaba Launches Compact Qwen3-VL Model; iFlytek AI Translation Earbuds Launch Globally; Gemini Code Appears in Veo3.1

Alibaba launches the compact Qwen3-VL series of visual language models, including 400 million and 800 million parameter versions, aiming to promote the application of multimodal AI technology on edge devices. The model helps enhance AI processing capabilities on devices and promotes the popularization of the technology.

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

The Alibaba Tongyi Qianwen team has launched two lightweight models in the Qwen3-VL series, with parameter scales of 4B and 8B. This series is the strongest family of vision-language models to date, adding small-parameter versions to lower deployment barriers while maintaining strong performance. Each scale offers two versions: instruction following and chain-of-thought reasoning, providing developers with more flexible options.

Valuation Doubles 3 Times in Half a Year! Cursor Rushes to 27 Billion Dollars, AI Coding Tools Become the New Favorite of Capital

Anysphere, parent of AI coding tool Cursor, is negotiating $1B+ funding at $27B pre-money valuation with Coatue and Accel. Valuation nearly tripled since Accel's June investment at $9.9B. The firm continues investing in AI infrastructure like Scale AI. Despite ample cash reserves, Anysphere seeks new funding.....

NVIDIA's Personal AI Supercomputer DGX Spark Will Be Released This Wednesday, Bringing AI Power to the Desktop Era

NVIDIA's first personal AI supercomputer, DGX Spark, will be released this week. It offers powerful performance for handling complex AI models and has a compact design suitable for desktop placement. Orders can be placed online on the official website starting October 15th, or purchased through designated channels in the United States. The official price has been adjusted from the previously announced 3000 USD to 3999 USD.

Silicon-Based Flow Platform Launches Alibaba Qwen3-VL Model, Significantly Enhancing Visual Cognition Capabilities

The Silicon-Based Flow Platform has launched the Alibaba Qwen3-VL open-source model, which shows significant progress in visual understanding, temporal analysis, and multimodal reasoning. It can effectively address challenges such as image blurriness and complex videos, enhancing visual cognition capabilities. It supports OCR in 32 languages, accurately processes weak visual information, and helps users easily handle complex visual tasks.

AI Daily: Meitu's RoboNeo Achieves Over a Million MAU in First Month; High-Quality Audio-Visual Synchronization Model Gaga AI Released; vivo Blue Heart 3B End-Stage Large Model Launched

Meitu's AI application RoboNeo exceeded one million monthly active users in its first month, and the company achieved success through internal organizational changes and deep application of AI tools. Meitu CEO Wu Xinhong emphasized the 'AI Native' concept, advocating innovation driven by AI. The product received quick market recognition, demonstrating the potential of AI technology in practical applications.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Xiaohongshu Launches Open-Source Multimodal Large Model dots.vlm1, Leading the Industry with NaViT Vision Encoder

AIbase基地

Innovation in Self-Development, Leading Performance

Evaluation Performance, Comparable to Top Closed-Source Models

This article is from AIbase Daily

AI News Recommendations

Li Feifei's Team Releases RTFM: Real-Time 3D World Generation with a Single H100

AI Daily: Alibaba Launches Compact Qwen3-VL Model; iFlytek AI Translation Earbuds Launch Globally; Gemini Code Appears in Veo3.1

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

Alibaba Launches Compact Qwen3-VL Model to Enhance Multimodal AI Efficiency and Accelerate Edge Device Deployment

Valuation Doubles 3 Times in Half a Year! Cursor Rushes to 27 Billion Dollars, AI Coding Tools Become the New Favorite of Capital

iOS26.1Beta3 implies Apple Intelligence will integrate more third-party AI models such as Google Gemini

NVIDIA's Personal AI Supercomputer DGX Spark Will Be Released This Wednesday, Bringing AI Power to the Desktop Era

Silicon-Based Flow Platform Launches Alibaba Qwen3-VL Model, Significantly Enhancing Visual Cognition Capabilities

Scholars Accuse Apple of Using Pirated Books to Train AI, Spurring Another Copyright Controversy

AI Daily: Meitu's RoboNeo Achieves Over a Million MAU in First Month; High-Quality Audio-Visual Synchronization Model Gaga AI Released; vivo Blue Heart 3B End-Stage Large Model Launched

GEO Services