DeepSeek-OCR 2 Officially Released: Introduces Visual Causal Flow for Document Recognition Closer to Human Logic

AIbase基地

Published inAI News · 4 min read · Jan 27, 2026

DeepSeek announced the release of its next-generation document recognition model, DeepSeek-OCR2. This model achieves significant breakthroughs in visual encoder design, aiming to address the lack of logical structure in traditional models when processing complex document layouts.

The core highlight of DeepSeek-OCR2 is its self-developed DeepEncoder V2 encoder. Unlike traditional visual models that process images in a fixed grid order from left to right and top to bottom, the new model introduces the concept of "visual causal flow." It can dynamically adjust the information processing order based on image semantics, intelligently sorting visual content before recognizing text, thus making the machine's reading logic more aligned with human understanding of tables, formulas, and complex documents.

In terms of architecture, the model continues to use an efficient encoder-decoder framework. After semantic modeling and reordering by DeepEncoder V2, the image is decoded by a mixture-of-experts (MoE) language model. Experimental data shows that in the OmniDocBench v1.5 benchmark test, DeepSeek-OCR2 achieved an overall score of 91.09%, an improvement of 3.73% over the previous version. Especially in terms of reading order accuracy, its edit distance has significantly decreased, indicating a stronger ability of the model to restore content structure.

Additionally, DeepSeek-OCR2 also demonstrates stronger stability in practical applications. In tests of PDF batch processing and online log data, the identification repetition rate has significantly decreased. This means that the model provides higher quality and more logical recognition output while maintaining low resource consumption.

Key points:

Dynamic Semantic Sorting: DeepSeek-OCR2 breaks the traditional fixed grid recognition order through "visual causal flow" technology, achieving dynamic reading based on semantics.
Leapfrog Performance Improvement: In authoritative benchmark tests, the new model's recognition performance improved by 3.73%, and reading order accuracy has been significantly enhanced.
Efficient MoE Architecture: The model continues to use the MoE architecture for decoding, achieving higher recognition accuracy and reliability without increasing computational load.

DeepSeek DeepSeek-OCR2 DeepEncoderV2 OCR

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Trillion Parameters! Xiaomi Launches Three MiMo-V2 Large Models: Lei Jun Announces Additional Investment of 16 Billion to Pursue AI

Xiaomi launched three self-developed large models in Spring 2026: MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS, marking its full entry into the 'Agent Era'. The founder announced over 16 billion yuan in AI R&D and capital investment for the year, with the trillion-parameter MiMo-V2-Pro showcasing technical strength in global competition.....

Mar 19, 2026

Lei Jun responds to Xiaomi's large model: We are indeed relatively low-key, but our strength has entered the top five globally

Xiaomi quietly launched its self-developed trillion-parameter model Mimo-V2-Pro, which leads in comprehensive rankings on authoritative lists, showcasing strong AI capabilities.....

Mar 19, 2026

Midjourney V8 Launches Testing: Image Generation Speed Increased 5 Times and Supports Native 2K Rendering

The Midjourney V8 model is released, with image generation speed increased 5 times, supporting 2K resolution. It adds parameters to enhance image coherence, improves the ability to follow complex text instructions, and optimizes the accuracy of text rendering within images.

Mar 19, 2026

DeepSeek V4 Recruitment Leak Secrets: AI Programming Will Become the Core Breakthrough

DeepSeek V4 set for April launch, with new job postings signaling focus on AI programming and infrastructure. Hiring Agent algorithm researchers, data evaluation experts, and engineers in Hangzhou and Beijing, aiming to compete with Claude and emphasizing Rust proficiency.....

Mar 19, 2026

DeepSeek V4 is About to Be Released: Job Recruitment Leaks the Secret, Programming Capabilities Target Claude?

Although DeepSeek V4 has not been officially released, the latest job recruitment information has revealed its development focus. The official team is currently recruiting core talents such as Agent algorithm engineers, data evaluation, and infrastructure engineers. The job requirements indicate that the team not only focuses on traditional algorithm capabilities but also places great emphasis on candidates' proficiency with cutting-edge development tools such as Claude Code and Cursor, suggesting that the new model will focus on the evolution of intelligent agents and code capabilities.

Mar 19, 2026

110

Xiaomi Launches Self-Developed MiMo-V2-TTS Text-to-Speech Large Model, Achieving Deep Control of Multiple Dialects and Emotions

Xiaomi unveils MiMo-V2-TTS, a self-developed TTS model that advances controllable, expressive speech synthesis. Built on a custom Audio Tokenizer and multi-codebook architecture, it enables precise macro-to-micro emotional adjustments via large-scale pre-training. The model achieves natural human-like prosody, supports diverse vocal styles, and handles emotional transitions within single sentences.....

Mar 19, 2026

100

Capable of speaking, singing, and even playing tricks! Xiaomi launches the MiMo-V2-TTS large model: dialects and emotions are handled effortlessly

Xiaomi launches its self-developed large-scale Text-to-Speech model, MiMo-V2-TTS, achieving a transition from mechanical repetition to emotional resonance. The model is based on an audio tokenizer and a multi-codebook joint architecture, pre-trained on hundreds of millions of hours of voice data, and is capable of acting, speaking, singing, and more, demonstrating versatile voice generation potential.

Mar 19, 2026

Japan Rakuten AI 3.0 Falls into Open Source Controversy: Urgent Remediation After Unauthorized Removal of DeepSeek License

Rakuten AI 3.0, touted as Japan's largest AI model, faces criticism for removing original open-source licenses. Based on DeepSeek-V3, its compliance issues highlight industry-standard fine-tuning practices.....

Mar 18, 2026

240

Tencent's 2025 Financial Report: AI Enhances Core Business Resilience, B2B Revenue Reaches a New High of 229.4 Billion Yuan

Tencent's Q4 2025 revenue grew 13% YoY to 194.37B yuan, with full-year revenue at 751.77B yuan. ToB business (FinTech & Enterprise Services) hit a record 229.43B yuan annually, with enterprise services up 22% in Q4. Tencent Cloud achieved full-year profitability with accelerated growth, entering sustainable development. AI strategy accelerated with heavy investment in foundational models.....

Mar 18, 2026

160

Report says DeepSeek V4 will be released in April together with Tencent's Yao Shunyu's Mengyuan model

DeepSeek V4 and Yao Shunyu's new Hunyuan model will launch in April 2026. Led by Liang Wenfeng, DeepSeek V4 is a multimodal large model with enhanced code capabilities and long-term memory, focusing on visual content processing, AI search, and exploring 'conditional memory' mechanisms.....

Mar 16, 2026

390

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

DeepSeek-OCR 2 Officially Released: Introduces Visual Causal Flow for Document Recognition Closer to Human Logic

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Trillion Parameters! Xiaomi Launches Three MiMo-V2 Large Models: Lei Jun Announces Additional Investment of 16 Billion to Pursue AI

Lei Jun responds to Xiaomi's large model: We are indeed relatively low-key, but our strength has entered the top five globally

Midjourney V8 Launches Testing: Image Generation Speed Increased 5 Times and Supports Native 2K Rendering

DeepSeek V4 Recruitment Leak Secrets: AI Programming Will Become the Core Breakthrough

DeepSeek V4 is About to Be Released: Job Recruitment Leaks the Secret, Programming Capabilities Target Claude?

Xiaomi Launches Self-Developed MiMo-V2-TTS Text-to-Speech Large Model, Achieving Deep Control of Multiple Dialects and Emotions

Capable of speaking, singing, and even playing tricks! Xiaomi launches the MiMo-V2-TTS large model: dialects and emotions are handled effortlessly

Japan Rakuten AI 3.0 Falls into Open Source Controversy: Urgent Remediation After Unauthorized Removal of DeepSeek License

Tencent's 2025 Financial Report: AI Enhances Core Business Resilience, B2B Revenue Reaches a New High of 229.4 Billion Yuan

Report says DeepSeek V4 will be released in April together with Tencent's Yao Shunyu's Mengyuan model

AI News Recommendations

Trillion Parameters! Xiaomi Launches Three MiMo-V2 Large Models: Lei Jun Announces Additional Investment of 16 Billion to Pursue AI

Lei Jun responds to Xiaomi's large model: We are indeed relatively low-key, but our strength has entered the top five globally

Midjourney V8 Launches Testing: Image Generation Speed Increased 5 Times and Supports Native 2K Rendering

DeepSeek V4 Recruitment Leak Secrets: AI Programming Will Become the Core Breakthrough

DeepSeek V4 is About to Be Released: Job Recruitment Leaks the Secret, Programming Capabilities Target Claude?

Xiaomi Launches Self-Developed MiMo-V2-TTS Text-to-Speech Large Model, Achieving Deep Control of Multiple Dialects and Emotions

Capable of speaking, singing, and even playing tricks! Xiaomi launches the MiMo-V2-TTS large model: dialects and emotions are handled effortlessly

Japan Rakuten AI 3.0 Falls into Open Source Controversy: Urgent Remediation After Unauthorized Removal of DeepSeek License

Tencent's 2025 Financial Report: AI Enhances Core Business Resilience, B2B Revenue Reaches a New High of 229.4 Billion Yuan

Report says DeepSeek V4 will be released in April together with Tencent's Yao Shunyu's Mengyuan model

GEO Services