GPT-5 and Gemini are both completely defeated in the run-up to Oracle Bone Script, and Tencent has released its first ancient character evaluation benchmark, Chronicles-OCR.

ME News, May 18 (UTC+8). According to Beating Monitoring, Tencent HunYuan and the SSV Digital Culture Laboratory, in collaboration with institutions including the Institute of Information Engineering, Chinese Academy of Sciences, have officially launched the first ancient character perception evaluation benchmark covering “Seven Body Changes” — Chronicles-OCR. The benchmark includes 2,800 images cross-annotated by experts, and for the first time uniformly quantifies the recognition difficulty across seven scripts, from oracle bone script to cursive script.

The research team evaluated 28 mainstream multimodal large language models. The results show that they nearly all fail on ancient fonts. In the cross-era character detection task, the core metrics of GPT-5 and Gemini 2.5 Pro are close to 0, and even the best-performing model only reaches 16.5. Even when drawing bounding boxes directly on the images to skip the localization step, the highest accuracy is only 27.1%, including Gemini 3.1 Pro’s oracle bone script accuracy of just 14.0%.

This confirms that modern models heavily rely on standardized modern layout priors. When confronted with unconstrained, high-noise ancient physical media, the models’ text segmentation mechanisms fail directly. The font classification results further indicate that models often recognize the texture of the carrier (such as tortoise shell patterns or bronze patina) rather than the actual character strokes.

The experiment also reveals a counterintuitive phenomenon: enabling “thinking mode” actually leads to a lower ancient character recognition rate. A comparison shows that almost all models supporting this mode perform worse after it is activated. When underlying visual perception is missing, the chain-of-thought not only cannot correct errors, but can instead become a hallucination amplifier, producing highly confident incorrect answers.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned