Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

AIbase基地

Published inAI News · 4 min read · May 19, 2026

Top artificial intelligence not only needs to understand modern code jumping on the screen, but also needs to read the inscriptions on tortoise shells from three thousand years ago. According to OSCHINA, Tencent Hunyuan large model, SSV Digital Culture Lab, and other institutions have jointly launched "Chronicles-OCR" with several universities and the Palace Museum. This is the first industry benchmark for Chinese ancient characters that comprehensively covers the evolutionary trajectory of the "seven forms of Chinese characters".

To truly reflect the recognition capabilities of large models, this dataset has been cross-annotated by domain experts at multiple levels, containing 2,800 high-quality images that are strictly balanced. For ancient scripts such as oracle bone script, bronze script, and seal script, the team used fine-grained character-level annotations; while for more mature scripts like clerical, regular, running, and cursive scripts, they used sequence-level transcriptions that preserved the original reading order.

Major visual models all failed

The project team designed four core tasks that progressively advanced based on this benchmark, strictly decoupling the "visual perception" and "semantic reasoning" of large models. After evaluating 28 mainstream multimodal large language models, including GPT-5, Gemini 3.1 Pro, and Claude Opus 4.7, the results were surprising.

In the face of ancient scripts without modern formatting prior knowledge, mainstream large models completely failed in end-to-end detection tasks, with the highest accuracy for fine-grained recognition only reaching 27.1%. Surprisingly, the experiment showed that enabling the reasoning mode of large models actually amplified the uncertainty of perception, leading to further decline in recognition performance.

Reveal shortcomings in micro brush stroke recognition

Evaluation also found that when classifying scripts, current visual large models tend to recognize the texture of the carrier rather than distinguish the microscopic brush stroke styles. This means that today's top AI models are still far from truly "understanding" traditional Chinese ancient scripts.

Chinese characters have evolved from the Yin Dynasty oracle bones to the present, and every stroke carries the continuity of civilization. Chronicles-OCR's open source does not shy away from this technical reality. It provides a clear optimization direction for future visual large models to move from simple "reading characters" to deep "reading history", through visible gaps.

Report: Zhiyuan Robotics Said to Be Striving for IPO with a Target Valuation of $20 Billion

Zhiyuan Robot, valued at ~$20B, is advancing its IPO with CITIC Securities as sponsor; projected 2026 revenue: RMB 4B. At WAIC 2026, it unveiled five new robots—Yuanzheng A3Ultra, Jingling G2Max, Lingxi X2EDU, Linjiedian dexterous hand, and Kutuo riding robot—embodying the "Three Intelligences in One" framework.....

AI Daily: Tencent Huan Yuan Launches Research Intelligent Agent Hyra-1.0; Alibaba Unveils Qwen-Image-3.0; Musk Puts Grok into Excel

Welcome to the 【AI Daily】 column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Tencent Huan Yuan launches the research intelligent agent Hyra-1.0, a single framework that connects AI development and scientific discovery. The research intelligent agent Hyra-1.0 developed by the Tencent Huan Yuan team uses recursive self-improvement and

Google DeepMind Launches GenCeption: Achieving Multiple Computer Vision Breakthroughs with a Video-Based Generation Model

Google DeepMind released GenCeption, a pretrained video generation model that unifies depth estimation, image segmentation, 3D pose estimation, etc. Its performance rivals or surpasses specialized models with much less training data. This challenges the landscape dominated by models like Segment Anything and Depth Anything.....

Shenzhen Science Multimodal Foundation Model Makes Debut in Shanghai: 11 Billion Parameters Integrate Six Types of Scientific Data, One Model Understands DNA to Weather Fields

Shanghai Academy of AI for Science unveiled 'Shenzhen', a multimodal foundation model, at WAIC 2026. Named after Journey to the West, it serves as a compact, open super brain for multidisciplinary research, enabling diverse scientific tasks. It invites researcher validation and co-construction, and powers the previously launched 'Dasheng' scientific agent.....

Wang He, Founder of Galaxy General-Purpose Robot: The ChatGPT Moment of Embodied Intelligence Will Arrive by 2028!

Galaxy General Robot CTO Wang He predicted at the 2026 World AI Conference that embodied intelligence will achieve a major breakthrough before 2028, with performance comparable to ChatGPT. The foundational model, trained on massive data, can reach a 70%-80% success rate on tasks not specifically trained for, similar to early digital models.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

Website AI Friendliness Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Visual Large Models Encounter Setback: First Chinese Ancient Script OCR Evaluation Benchmark Open-Sourced

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Aliyun Open Sources 0.8B Document Parsing Model OvisOCR2, Ends-to-End Solution Tops OmniDocBench

Musk Announces Grok 4.5 Available on All Platforms: Faster, Cheaper, and More Productive Than Fable

Record-breaking compensation! Anthropic reaches a $1.5 billion settlement for training on pirated books

Report: Zhiyuan Robotics Said to Be Striving for IPO with a Target Valuation of $20 Billion

AI Daily: Tencent Huan Yuan Launches Research Intelligent Agent Hyra-1.0; Alibaba Unveils Qwen-Image-3.0; Musk Puts Grok into Excel

Tencent Hyra-1.0 Launches Research Intelligent Agent, Unifying AI Development and Scientific Discovery in a Single Framework

Anthropic Approved 1.5 Billion Dollar Copyright Settlement Agreement, Will Pay Compensation for 500,000 Works

Google DeepMind Launches GenCeption: Achieving Multiple Computer Vision Breakthroughs with a Video-Based Generation Model

Shenzhen Science Multimodal Foundation Model Makes Debut in Shanghai: 11 Billion Parameters Integrate Six Types of Scientific Data, One Model Understands DNA to Weather Fields

Wang He, Founder of Galaxy General-Purpose Robot: The ChatGPT Moment of Embodied Intelligence Will Arrive by 2028!

AI News Recommendations

Aliyun Open Sources 0.8B Document Parsing Model OvisOCR2, Ends-to-End Solution Tops OmniDocBench

Musk Announces Grok 4.5 Available on All Platforms: Faster, Cheaper, and More Productive Than Fable

Record-breaking compensation! Anthropic reaches a $1.5 billion settlement for training on pirated books

Report: Zhiyuan Robotics Said to Be Striving for IPO with a Target Valuation of $20 Billion

AI Daily: Tencent Huan Yuan Launches Research Intelligent Agent Hyra-1.0; Alibaba Unveils Qwen-Image-3.0; Musk Puts Grok into Excel

Tencent Hyra-1.0 Launches Research Intelligent Agent, Unifying AI Development and Scientific Discovery in a Single Framework

Anthropic Approved 1.5 Billion Dollar Copyright Settlement Agreement, Will Pay Compensation for 500,000 Works

Google DeepMind Launches GenCeption: Achieving Multiple Computer Vision Breakthroughs with a Video-Based Generation Model

Shenzhen Science Multimodal Foundation Model Makes Debut in Shanghai: 11 Billion Parameters Integrate Six Types of Scientific Data, One Model Understands DNA to Weather Fields

Wang He, Founder of Galaxy General-Purpose Robot: The ChatGPT Moment of Embodied Intelligence Will Arrive by 2028!