Firecrawl rewrites the PDF parsing engine in Rust, boosting speed up to 5.7 times faster than before

ME News Report, April 15 (UTC+8), according to 1M AI News monitoring, the web data extraction tool Firecrawl released Fire-PDF, a PDF parsing engine rewritten in Rust, which speeds up converting PDFs to structured Markdown by 3.5 to 5.7 times compared to the previous generation, with an average processing time per page below 400 milliseconds.
The speed improvement mainly comes from reducing unnecessary GPU calls. Firecrawl also open-sourced the Rust library pdf-inspector, which can classify each PDF page in milliseconds: pure text pages are directly extracted natively, skipping GPU; only pages with scans or dense images are sent to neural network layout models and GLM-OCR visual language models for processing.
For example, a financial report with 150 pages of text and 60 pages of scans mostly do not require GPU.
In terms of accuracy, Fire-PDF sets parameters for different content types: tables get higher token limits and up to 25 seconds generation time, formulas are preserved in LaTeX, and multi-column layouts are predicted for reading order via neural networks.
Fire-PDF is automatically enabled for all Firecrawl users without configuration.
(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin