On May 18, Tencent Hunyuan, SSV Digital Culture Lab, and SSV Technical Architecture Department jointly with the Key Laboratory of Oracle Bone Script Information Processing at Anyang Normal University, the Institute of Information Engineering of the Chinese Academy of Sciences, and Nankai University officially launched the first industry benchmark for ancient Chinese character recognition that covers the complete evolutionary trajectory of the "Seven Script Changes" Chronicles-OCR. The release of this benchmark aims to accurately measure the perceptual capabilities of multimodal large language models (VLLM) when facing visual distribution drifts of Chinese characters over three thousand years, promoting breakthroughs in the underlying technologies of digital humanities.

This dataset was annotated by domain experts through multi-level cross-annotation, containing 2,800 high-quality images that are strictly balanced. To address the characteristics of early scripts (oracle bone, bronze inscriptions, seal script) and mature scripts (clerical, regular, running, cursive), the project team designed a stage-adaptive annotation paradigm and established four core tasks: cross-era character detection, fine-grained ancient character recognition, ancient text transcription, and font classification, achieving a decoupled evaluation of visual perception and semantic reasoning.

In evaluating 28 mainstream large models such as GPT-5, Gemini 3.1 Pro, and Claude Opus 4.7, the benchmark revealed shortcomings in current industry multimodal capabilities: in end-to-end detection tasks on early scripts, mainstream models were completely defeated due to the lack of modern layout priors; in fine-grained recognition, the highest accuracy was only 27.1%; and in font classification tasks, models tended to identify the texture of the medium rather than the micro strokes. Notably, the experiments showed that enabling reasoning mode actually amplifies perceptual uncertainty, leading to a decline in performance.

The open-source release of Chronicles-OCR not only quantifies the technological gap between top commercial models and the actual research needs of ancient scripts but also clarifies the technical path for optimizing micro-perception for both academia and industry. Enabling large models to move from "recognizing characters" to "reading history" will be a crucial step for multimodal large models to tackle long-tail vertical scenarios and preserve cultural heritage.

QQ20260519-092228.jpg