Led by core members behind DeepSeek before the launch, Baidu open-sourced its 3B document parsing model, Unlimited OCR

robot
Abstract generation in progress
ME AI News, according to Beating Monitoring, Baidu has open-sourced the document intelligent parsing large model Unlimited-OCR and released a technical report. The report shows that the project’s technical director is a mysterious member with the pseudonym "YY". Industry speculation widely suggests that "YY"’s real identity is Wei Haoran, a former core author of DeepSeek-OCR. The Unlimited-OCR model is also built based on the DeepSeek-OCR foundation.
Unlimited-OCR achieved a score of 93.92% on the long document parsing benchmark OmniDocBench v1.6, setting a new end-to-end SOTA record. Traditional large models for document parsing often slow down significantly and consume a lot of memory when processing multi-page long texts due to the linear growth of key-value cache (KV cache).
To address the slowdown issue, Baidu introduced the reference sliding window attention mechanism R-SWA. During text decoding and generation, the model only focuses on all image features and the recently generated text within a fixed window (default 128 tokens), thereby capping the overall size of the KV cache as a constant.
R-SWA not only prevents image details from becoming blurry as the window slides out but also ensures that inference speed and memory consumption remain constant when parsing documents longer than 40 pages. In tests, it was 12.7% faster than DeepSeek-OCR.
Currently, Baidu has open-sourced the code and weights of Unlimited-OCR under the MIT license, supporting mainstream engines such as Hugging Face Transformers, vLLM, and SGLang, with SGLang already supporting cache optimization for R-SWA.
In the future, the team plans to extend the reference sliding window attention to other reference-based tasks such as speech recognition (ASR) and translation.
(Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments