Small models directly output JSON, so endpoint document extraction no longer requires writing a lengthy essay first and then parsing it. 450M can run quite smoothly.

View Original
CoinNetwork
Liquid AI Open-Source Small-Scale Multimodal Model: Directly Extract Images as JSON Structured Data on the Edge Side
Liquid AI has open-sourced two small multimodal models, lfm2.5-vl-1.6b-extract and lfm2.5-vl-450m-extract, optimized specifically for extracting structured image data, capable of converting images directly into JSON on the device side based on field lists, eliminating the step of full-text generation and parsing. Offering two versions: 1.6b and 450m, following the lfm open license v1.0. Official evaluations show excellent performance in document scanning, in-vehicle cabin understanding, and industrial inspection scenarios; in benchmark tests, the 1.6b model rivals 4b general models, while the 450m is equivalent to a 2b model. The weights are now available for download on Hugging Face.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned