ME AI News, according to Beating Monitoring, Baidu has open-sourced the document intelligent parsing large model Unlimited-OCR and released a technical report. The report shows that the project’s technical director is a mysterious member with the pseudonym "YY". Industry speculation widely suggests that "YY"’s real identity is Wei Haoran, a former core author of DeepSeek-OCR. The Unlimited-OCR model is also built based on the DeepSeek-OCR foundation.
Unlimited-OCR achieved a score of 93.92% on the long document parsing benchmark OmniDocBench v1.6, setting a new end-to-end SOTA record. Traditional large models for document parsing often slow down significantly and consume a lot of memory when processing multi-page long texts due to the linear growth of key-value cache (KV cache).
To address the slowdown issue, Baidu introduced the reference sliding window attention mechanism R-SWA. During text decoding and generation, the model only focuses on all image features and the recently generated text within a fixed window (default 128 tokens), thereby capping the overall size of the KV cache as a constant.
R-SWA not only prevents image details from becoming blurry as the window slides out but also ensures that inference speed and memory consumption remain constant when parsing documents longer than 40 pages. In tests, it was 12.7% faster than DeepSeek-OCR.
Currently, Baidu has open-sourced the code and weights of Unlimited-OCR under the MIT license, supporting mainstream engines such as Hugging Face Transformers, vLLM, and SGLang, with SGLang already supporting cache optimization for R-SWA.
In the future, the team plans to extend the reference sliding window attention to other reference-based tasks such as speech recognition (ASR) and translation.
(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Get2SharesOfSKHynixAtZeroCost
146.42K Popularity
#
GateStocks7x24Trading
8.76M Popularity
#
PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇬🇭
910.06K Popularity
#
TradFiCFDGoldMasters
2.09M Popularity
#
SpaceXPlunges16%MarketCapErodes400B
1.99M Popularity

Pinned

Sitemap

Led by core members behind DeepSeek before the launch, Baidu open-sourced its 3B document parsing model, Unlimited OCR

Trending Topics

Get2SharesOfSKHynixAtZeroCost

GateStocks7x24Trading

PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇬🇭

TradFiCFDGoldMasters

SpaceXPlunges16%MarketCapErodes400B

Pinned