ME News, May 18 (UTC+8). According to Beating Monitoring, Tencent HunYuan and the SSV Digital Culture Laboratory, in collaboration with institutions including the Institute of Information Engineering, Chinese Academy of Sciences, have officially launched the first ancient character perception evaluation benchmark covering “Seven Body Changes” — Chronicles-OCR. The benchmark includes 2,800 images cross-annotated by experts, and for the first time uniformly quantifies the recognition difficulty across seven scripts, from oracle bone script to cursive script.

The research team evaluated 28 mainstream multimodal large language models. The results show that they nearly all fail on ancient fonts. In the cross-era character detection task, the core metrics of GPT-5 and Gemini 2.5 Pro are close to 0, and even the best-performing model only reaches 16.5. Even when drawing bounding boxes directly on the images to skip the localization step, the highest accuracy is only 27.1%, including Gemini 3.1 Pro’s oracle bone script accuracy of just 14.0%.

This confirms that modern models heavily rely on standardized modern layout priors. When confronted with unconstrained, high-noise ancient physical media, the models’ text segmentation mechanisms fail directly. The font classification results further indicate that models often recognize the texture of the carrier (such as tortoise shell patterns or bronze patina) rather than the actual character strokes.

The experiment also reveals a counterintuitive phenomenon: enabling “thinking mode” actually leads to a lower ancient character recognition rate. A comparison shows that almost all models supporting this mode perform worse after it is activated. When underlying visual perception is missing, the chain-of-thought not only cannot correct errors, but can instead become a hallucination amplifier, producing highly confident incorrect answers.

(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
MyGateTradeStory
167.16K Popularity
#
TradFiCFDGoldMasters
2M Popularity
#
PredictWorldCup🇺🇸vs🇵🇾
775.37K Popularity
#
MarvellSurgesOver11%LeadingChipSectorWithAI
5.73M Popularity
#
USPPIHits2.5YearHigh
397.44K Popularity

Pinned

Sitemap

GPT-5 and Gemini are both completely defeated in the run-up to Oracle Bone Script, and Tencent has released its first ancient character evaluation benchmark, Chronicles-OCR.

Trending Topics

MyGateTradeStory

TradFiCFDGoldMasters

PredictWorldCup🇺🇸vs🇵🇾

MarvellSurgesOver11%LeadingChipSectorWithAI

USPPIHits2.5YearHigh

Pinned