Liquid AI open-source small-scale multimodal model: directly extract images as JSON-structured data on the edge side

robot
Abstract generation in progress

According to “Beating Monitoring” from Data Insight, Liquid AI has open-sourced two small-scale multimodal models: LFM2.5-VL-1.6B-Extract and LFM2.5-VL-450M-Extract. The new models are specifically optimized for extracting structured image data: they can convert images directly into JSON-formatted data on the device side based on a user-specified list of fields, eliminating the traditional step of generating full text with multimodal models and then performing a second-stage parsing.

The new models are provided in two parameter sizes, 1.6 billion (1.6B) and 450 million (450M), released under the LFM Open License v1.0. Official evaluations show that the new models perform excellently in scenarios such as document scanning, in-vehicle cabin understanding, and industrial inspection. In benchmark tests, the 1.6B model’s performance can hold its own against general multimodal models in the 4 billion (4B) parameter class, while the 450M model can be compared to models in the 2 billion (2B) parameter class.

In terms of deployment, the new models are adapted for various intelligent hardware and edge device system-on-chips (SoCs), enabling offline deployment in edge scenarios such as in-vehicle cabin understanding, document scanning, and industrial inspection. Liquid AI has now opened model-weight downloads on the Hugging Face platform.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned