On October 16, the PaddlePaddle team officially released the latest vision-language model PaddleOCR-VL, which caused a sensation in the global OCR (Optical Character Recognition) field upon its launch. The model achieved a score of 92.56 in the authoritative evaluation OmniDocBench V1.5 with 0.9B parameters, surpassing all mainstream models including DeepSeek-OCR, successfully ranking first on the global OCR list.

As of October 21, the top three spots on Huggingface's global model trend list (Trending Models) were all occupied by OCR models:
🥇PaddleOCR-VL (PaddlePaddle)
🥈DeepSeek-OCR
🥉NanonetOCR
Among them, PaddleOCR-VL from Baidu has been at the top for 5 consecutive days, becoming the most attention-grabbing open-source OCR model currently.
PaddleOCR-VL supports 109 language recognitions, accurately parsing text, tables, formulas, and charts, and possesses document semantic structure reconstruction capabilities. This means it not only "recognizes characters," but also "understands" complex document content, showing high practical value in fields such as research papers, invoice recognition, and knowledge extraction.
Notably, the DeepSeek team also specially acknowledged PaddleOCR in their paper and revealed that part of their training data was annotated using PaddleOCR. This detail reveals the real logic behind the current prosperity of OCR models: institutions such as Baidu, DeepSeek, and Shanghai AI Lab almost simultaneously open-sourced their OCR models, the purpose is not merely to compete in recognition performance, but to provide foundational capabilities for cleaning and annotating data for large model training.
In other words, the core of this "OCR arms race" is not just who recognizes more accurately, but who can help AI understand the text and images in the world faster.






