DeepSeek Launches New 3B OCR Model: A Revolutionary Breakthrough in Efficient Document Parsing

AIbase基地

Published inAI News · 4 min read · Oct 21, 2025

AI technology company DeepSeek recently launched a new optical character recognition (OCR) model called "DeepSeek-OCR." This model is an end-to-end vision-language model (VLM) designed to efficiently parse documents by compressing long text into a small set of visual tokens and then decoding them using a language model.

The research team stated that the model achieved 97% decoding accuracy on the Fox benchmark. Even when the ratio of text tokens to visual tokens was 10 times, the accuracy remained good, and it still showed useful characteristics at 20 times compression. Additionally, DeepSeek-OCR performed well on the OmniDocBench benchmark, using far fewer visual tokens than traditional models.

DeepSeek-OCR's architecture consists of two main components: a visual encoder for high-resolution input called DeepEncoder and an expert mixture decoder named DeepSeek3B-MoE-A570M. The encoder uses a local perception window attention mechanism based on SAM and a convolutional compression algorithm, which effectively controls activation memory at high resolutions and reduces the number of output tokens. The decoder is a model with 3 billion parameters, with about 570 million active parameters per token.

When using different modes, DeepEncoder provides multiple resolution options, including Tiny, Small, Base, and Large modes, each corresponding to different numbers of visual tokens and resolutions. There are also dynamic modes called Gundam and Gundam-Master, which can flexibly adjust the token budget based on page complexity.

During training, the DeepSeek team used a phased training process, first training DeepEncoder for next-token prediction, and then conducting full-system training on multiple nodes. Finally, it can generate over 200,000 pages of documents daily. For practical applications, the team recommends starting with the Small mode, and if the page contains dense small fonts or a high number of tokens, the Gundam mode can be selected.

The release of DeepSeek-OCR marks a significant advancement in the field of document artificial intelligence. Its efficiency and flexibility make it adaptable for processing various types of documents.

Paper: https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf

Huggingface: https://huggingface.co/deepseek-ai/DeepSeek-OCR

Key points:
🌟 DeepSeek-OCR is a newly released 3B vision-language model with efficient OCR and document parsing capabilities.
📊 The model achieved 97% decoding accuracy on the Fox benchmark and maintains good performance even with significant compression.
🔧 DeepEncoder supports multiple modes and resolution choices to adapt to different document complexities and needs.

AINeologism DeepSeek-OCR OCRModel DeepSeek3B-M

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: KlingAIAvatar 2.0 Launches; Google Introduces Gemini 3 Deep Think Mode; Alibaba Cloud XiYan-SQL Strongly Wins

Welcome to the 【AI Daily】 section! Here is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. KlingAIAvatar 2.0 Launches and Goes Viral: Generate a 5-minute dance and song in one click, the digital human officially bids farewell to the 'pale face' era. KlingAIAvatar 2.0 uses multimodal directing

Dec 5, 2025

240

American Broadcaster Falls into a Harassment Scandal Due to AI Advice, Faces 70 Years in Prison!

A 31-year-old podcaster faces charges for cyberstalking and interstate threats, potentially resulting in 70 years in prison and a $3.5 million fine. He expressed a desire for a 'wife' and extreme anger toward women on social media, referring to ChatGPT as his 'best friend,' highlighting AI's negative role in the case.....

Dec 5, 2025

160

Microsoft Open-Sources Real-Time Speech Model VibeVoice-Realtime-0.5B, 300ms Real-Time Voice Activation, No Breathing Even for 90-Minute Long Audio

Microsoft open-sources the real-time speech model VibeVoice-Realtime-0.5B, which offers extremely low latency and near-human voice performance. The model takes an average of only 300 milliseconds from text input to voice output, far less than traditional TTS models (1-3 seconds), achieving almost zero latency real-time speech synthesis.

Dec 5, 2025

180

KlingAI Avatar 2.0 Launches and Immediately Becomes a Hit: Singing and Dancing in 5 Minutes with One Click, Digital Humans Officially Bid Farewell to the Stiff Expression Era

Kuaishou's Kling AI launches Avatar2.0, enabling users to create up to 5-minute singing videos from a photo and music. The model enhances digital human expressiveness with natural facial and body movements, moving beyond stiff lip-syncing, marking a shift from static to dynamic AI content creation.....

Dec 5, 2025

250

Ten Million Salary + DeepSeek Core Member Joining, Xiaomi AI Large Model Accelerates: Lu Weibing Says Performance Exceeds Expectations

Xiaomi elevates AI large models as its core strategy for the next decade, with quarterly investment growth exceeding 50% over the past year. The company has launched a global talent recruitment drive, offering salaries up to 10 million yuan per position to address talent shortages. Former DeepSeek core members have joined, unveiling the MiMo team.....

Dec 5, 2025

120

Microsoft Launches VibeVoice-Realtime-0.5B: Achieving Almost Real-Time Natural Speech Generation with Just 0.5B Parameters

Microsoft has released the real-time text-to-speech model VibeVoice-Realtime-0.5B, which can start speaking in about 300 milliseconds with just 0.5B parameters, achieving near real-time smooth speech generation. The model supports real-time transcription and speech generation for both Chinese and English, with slightly lower performance in Chinese but maintaining overall high fluency and fidelity. The natural sound quality has attracted attention.

Dec 5, 2025

200

Xiaomi AI Large Model Accelerates Again! Lu Weibing Reveals Amazing Progress and Talent Recruitment Plan

Xiaomi executive Lu Weibing announced increased AI investment, with significant progress in large models and applications exceeding expectations. He emphasized integrating AI with reality as the future direction and is actively recruiting talent.....

Dec 5, 2025

130

Google Launches Gemini 3 Deep Think Mode, AI Reasoning Capabilities Significantly Improved

Google has launched the Gemini 3 Deep Think mode for the Gemini app, targeting Ultra subscribers. This mode greatly enhances reasoning capabilities, focusing on complex mathematical, scientific, and logical problems, challenging current top models. It performs exceptionally well in various benchmark tests, such as scoring 41.0% without tools in the 'Final Exam of Humanity' and leading in the ARC-AGI-2 test when using code execution.

Dec 5, 2025

200

AliQwen APP Launches Qwen3-Learning Learning Large Model: Free, Unlimited Times, One-Click Question Review!

The AliQwen APP has launched the free learning model Qwen3-Learning, providing photo question-solving and homework correction functions for K-12 teachers and students, with no usage restrictions. This model performs excellently in recognizing multiple country curricula and solving problems quickly, comparable to paid services.

Dec 5, 2025

220

Kuaishou Coling Digital Human 2.0 Launches: Create a Virtual Character That Can Speak and Act in Three Steps

Kuaishou's Keling Digital Human 2.0 is now fully launched, enabling users to create expressive digital human videos in just three steps. The new version supports uploading character images, adding voiceovers, and describing performances, generating videos up to 5 minutes long. Compared to the previous version, 2.0 significantly enhances expressiveness with precise control over hand movements and lip-syncing.....

Dec 5, 2025

130

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services

AI Model Compatibility Checker

AI Deployment Calculator

DeepSeek Launches New 3B OCR Model: A Revolutionary Breakthrough in Efficient Document Parsing

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: KlingAIAvatar 2.0 Launches; Google Introduces Gemini 3 Deep Think Mode; Alibaba Cloud XiYan-SQL Strongly Wins

American Broadcaster Falls into a Harassment Scandal Due to AI Advice, Faces 70 Years in Prison!

Microsoft Open-Sources Real-Time Speech Model VibeVoice-Realtime-0.5B, 300ms Real-Time Voice Activation, No Breathing Even for 90-Minute Long Audio

KlingAI Avatar 2.0 Launches and Immediately Becomes a Hit: Singing and Dancing in 5 Minutes with One Click, Digital Humans Officially Bid Farewell to the Stiff Expression Era

Ten Million Salary + DeepSeek Core Member Joining, Xiaomi AI Large Model Accelerates: Lu Weibing Says Performance Exceeds Expectations

Microsoft Launches VibeVoice-Realtime-0.5B: Achieving Almost Real-Time Natural Speech Generation with Just 0.5B Parameters

Xiaomi AI Large Model Accelerates Again! Lu Weibing Reveals Amazing Progress and Talent Recruitment Plan

Google Launches Gemini 3 Deep Think Mode, AI Reasoning Capabilities Significantly Improved

AliQwen APP Launches Qwen3-Learning Learning Large Model: Free, Unlimited Times, One-Click Question Review!

Kuaishou Coling Digital Human 2.0 Launches: Create a Virtual Character That Can Speak and Act in Three Steps

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

DeepSeek Launches New 3B OCR Model: A Revolutionary Breakthrough in Efficient Document Parsing

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: KlingAIAvatar 2.0 Launches; Google Introduces Gemini 3 Deep Think Mode; Alibaba Cloud XiYan-SQL Strongly Wins

American Broadcaster Falls into a Harassment Scandal Due to AI Advice, Faces 70 Years in Prison!

Microsoft Open-Sources Real-Time Speech Model VibeVoice-Realtime-0.5B, 300ms Real-Time Voice Activation, No Breathing Even for 90-Minute Long Audio

KlingAI Avatar 2.0 Launches and Immediately Becomes a Hit: Singing and Dancing in 5 Minutes with One Click, Digital Humans Officially Bid Farewell to the Stiff Expression Era

Ten Million Salary + DeepSeek Core Member Joining, Xiaomi AI Large Model Accelerates: Lu Weibing Says Performance Exceeds Expectations

Microsoft Launches VibeVoice-Realtime-0.5B: Achieving Almost Real-Time Natural Speech Generation with Just 0.5B Parameters

Xiaomi AI Large Model Accelerates Again! Lu Weibing Reveals Amazing Progress and Talent Recruitment Plan

Google Launches Gemini 3 Deep Think Mode, AI Reasoning Capabilities Significantly Improved

AliQwen APP Launches Qwen3-Learning Learning Large Model: Free, Unlimited Times, One-Click Question Review!

Kuaishou Coling Digital Human 2.0 Launches: Create a Virtual Character That Can Speak and Act in Three Steps

GEO Services