Tencent Hunyuan and Others Jointly Release the First Ancient Script OCR Evaluation Benchmark Chronicles-OCR

AIbase基地

Published inAI News · 4 min read · May 19, 2026

109

On May 18, Tencent Hunyuan, SSV Digital Culture Lab, and SSV Technical Architecture Department jointly with the Key Laboratory of Oracle Bone Script Information Processing at Anyang Normal University, the Institute of Information Engineering of the Chinese Academy of Sciences, and Nankai University officially launched the first industry benchmark for ancient Chinese character recognition that covers the complete evolutionary trajectory of the "Seven Script Changes" Chronicles-OCR. The release of this benchmark aims to accurately measure the perceptual capabilities of multimodal large language models (VLLM) when facing visual distribution drifts of Chinese characters over three thousand years, promoting breakthroughs in the underlying technologies of digital humanities.

This dataset was annotated by domain experts through multi-level cross-annotation, containing 2,800 high-quality images that are strictly balanced. To address the characteristics of early scripts (oracle bone, bronze inscriptions, seal script) and mature scripts (clerical, regular, running, cursive), the project team designed a stage-adaptive annotation paradigm and established four core tasks: cross-era character detection, fine-grained ancient character recognition, ancient text transcription, and font classification, achieving a decoupled evaluation of visual perception and semantic reasoning.

In evaluating 28 mainstream large models such as GPT-5, Gemini 3.1 Pro, and Claude Opus 4.7, the benchmark revealed shortcomings in current industry multimodal capabilities: in end-to-end detection tasks on early scripts, mainstream models were completely defeated due to the lack of modern layout priors; in fine-grained recognition, the highest accuracy was only 27.1%; and in font classification tasks, models tended to identify the texture of the medium rather than the micro strokes. Notably, the experiments showed that enabling reasoning mode actually amplifies perceptual uncertainty, leading to a decline in performance.

The open-source release of Chronicles-OCR not only quantifies the technological gap between top commercial models and the actual research needs of ancient scripts but also clarifies the technical path for optimizing micro-perception for both academia and industry. Enabling large models to move from "recognizing characters" to "reading history" will be a crucial step for multimodal large models to tackle long-tail vertical scenarios and preserve cultural heritage.

Chronicles-OCR MultimodalLargeLanguageModel AncientScriptRecognition SSVDigitalCultureLaboratory

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Hengyuan Releases HyOCR-1.5; PixVerse Completes $439 Million Funding; SenseTime Opensources SenseNova-Vision-7B-MoT

Welcome to the [AI Daily] section! This is your guide to exploring the world of artificial intelligence every day. Every day, we bring you the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications. Discover new AI products: https://app.aibase.com/zh1. Tencent Hengyuan Releases HyOCR-1.5: Only 1B parameters for a 6.37x speed-up during inference. HyOCR-1.5, as a lightweight end-to-end OCR model, has achieved performance and efficiency through technological innovation.

Jul 14, 2026

2.8k

End-to-End OCR Sets a New Benchmark: HyOCR-1.5 Achieves Dual Breakthroughs in Speed and Capabilities

HyOCR-1.5, a lightweight end-to-end OCR model, is released with major performance and efficiency gains. As the first fully open-source OCR model, it opens weights, training recipes, data construction methods, and inference acceleration, enabling easy reproduction, fine-tuning, and deployment on consumer GPUs or laptops.....

Jul 14, 2026

500

Baidu Open-sources 3B Model Unlimited OCR: Star Count Exceeds 10,000 in 5 Days, Setting a New Record for Long Document Parsing

Baidu open-sources a 3B-parameter end-to-end OCR model called Unlimited OCR, specifically designed for long documents such as books and papers. The project exceeded 10,000 GitHub stars within 5 days and topped four trending lists. Technically, the model activates approximately 570M parameters, and it innovatively introduces the Reference Sliding Window Attention mechanism, breaking the limitation of page-by-page stitching, supporting continuous parsing of dozens of pages at once, and significantly improving the efficiency of processing long documents.

Jun 29, 2026

540

French AI startup Mistral AI launches OCR4 model: supports 170 languages, more human-like interaction experience

Mistral AI (France) launches OCR4 document recognition model, supporting 170 languages across 10 language families. It scored 93.07 on OmniDocBench, with accurate and natural outputs, outperforming GPT-5.5 Pro and Gemini-3.1 Pro in user experience.....

Jun 26, 2026

550

Mistral AI Launches OCR4 Model: Supports 170 Languages, Output Quality Exceeds GPT and Gemini

French AI startup Mistral AI released OCR 4, a document recognition model supporting 170 languages across 10 language families. It scored 93.07 in authoritative tests, and human review rated its output quality above competitors like GPT-5.5 Pro. The model is compact, versatile across many tasks, and specialized in document recognition.....

Jun 26, 2026

480

ByteDance Collaborates with HKUST to Release MMProLong: Long Document LMM Training Q&A Pairs are Far More Efficient than OCR Transcription

The ByteDance Seed team and the Hong Kong University of Science and Technology jointly released the MMProLong model, which breaks through the efficiency of long document processing in multimodal large language models. The research reveals the key influence of data organization on long context capabilities, breaking traditional training paths and directly addressing current pain points in LMM training.

May 25, 2026

860

Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

Tencent's Hunyuan large model, in collaboration with the Palace Museum and other institutions, launched 'Chronicles-OCR,' the industry's first ancient script perception benchmark covering the evolution trajectory of the 'seven script styles' of Chinese characters. The dataset, cross-annotated by experts with 2,800 images, tests AI's ability to recognize ancient scripts like oracle bone inscriptions, advancing AI's understanding of Chinese charact....

May 19, 2026

600

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

Tencent Hunyuan, in collaboration with UCLA and CUHK, has open-sourced a multimodal search agent to address the evolution of Multimodal Large Language Models (MLLMs) from passive understanding to active reasoning. Previously, the lack of high-quality data, automated trajectory synthesis paths, and training recipes hindered the reproduction of top-tier agents. This open-source initiative aims to break the deadlock and advance community development....

May 7, 2026

1.1k

PaddleOCR Tops GitHub Star Global First: Chinese Open Source Power Leads in OCR Field

Baidu's PaddleOCR has topped the GitHub global OCR open-source project list, surpassing older projects like Tesseract, marking that China's deep learning framework has an international leading influence in a vertical technology field. Its success stems from ultra-lightweight models and full-stack technical capabilities, providing a complete solution from algorithm to deployment.

Mar 30, 2026

1.2k

DeepSeek V4 to be released next week: Native support for audio, video, image, and text generation, compatible with domestic computing power

DeepSeek will launch the multimodal model V4 next week, supporting image, video, and text generation, targeting the high-performance, low-cost open-source market in China. This follows the R1 reasoning model release in January. Initial technical notes will be provided, with a detailed engineering report in a month. V4 has established foundational collaborations with Huawei and Cambricon.....

Feb 28, 2026

2.5k

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

Website AI Friendliness Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Tencent Hunyuan and Others Jointly Release the First Ancient Script OCR Evaluation Benchmark Chronicles-OCR

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Hengyuan Releases HyOCR-1.5; PixVerse Completes $439 Million Funding; SenseTime Opensources SenseNova-Vision-7B-MoT

End-to-End OCR Sets a New Benchmark: HyOCR-1.5 Achieves Dual Breakthroughs in Speed and Capabilities

Baidu Open-sources 3B Model Unlimited OCR: Star Count Exceeds 10,000 in 5 Days, Setting a New Record for Long Document Parsing

French AI startup Mistral AI launches OCR4 model: supports 170 languages, more human-like interaction experience

Mistral AI Launches OCR4 Model: Supports 170 Languages, Output Quality Exceeds GPT and Gemini

ByteDance Collaborates with HKUST to Release MMProLong: Long Document LMM Training Q&A Pairs are Far More Efficient than OCR Transcription

Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

PaddleOCR Tops GitHub Star Global First: Chinese Open Source Power Leads in OCR Field

DeepSeek V4 to be released next week: Native support for audio, video, image, and text generation, compatible with domestic computing power

AI News Recommendations

AI Daily: Hengyuan Releases HyOCR-1.5; PixVerse Completes $439 Million Funding; SenseTime Opensources SenseNova-Vision-7B-MoT

End-to-End OCR Sets a New Benchmark: HyOCR-1.5 Achieves Dual Breakthroughs in Speed and Capabilities

Baidu Open-sources 3B Model Unlimited OCR: Star Count Exceeds 10,000 in 5 Days, Setting a New Record for Long Document Parsing

French AI startup Mistral AI launches OCR4 model: supports 170 languages, more human-like interaction experience

Mistral AI Launches OCR4 Model: Supports 170 Languages, Output Quality Exceeds GPT and Gemini

ByteDance Collaborates with HKUST to Release MMProLong: Long Document LMM Training Q&A Pairs are Far More Efficient than OCR Transcription

Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

PaddleOCR Tops GitHub Star Global First: Chinese Open Source Power Leads in OCR Field

DeepSeek V4 to be released next week: Native support for audio, video, image, and text generation, compatible with domestic computing power