Microsoft has finally entered the fray with a 7B intelligent agent—Fara-7B directly views images to operate the browser, and the MIT open-source release is pretty interesting

View Original
MeNews
Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B
AIMPACT News, May 16 (UTC+8), Microsoft launched Fara-7B, its first 7B-parameter intelligent small language model specifically designed for computer-usage scenarios. The model uses a multimodal decoder architecture that can accept screenshot images and text context, directly predicting parameterized chains of thought and operational actions. Built on Qwen 2.5-VL (7B), it supports a 128k context length, was trained for 2.5 days on 64 H100 GPUs, and was released under the MIT license on November 24, 2025. Fara-7B perceives browser input through screenshots and, together with internal reasoning and historical state records, predicts the next step and parameters (such as click coordinates). Its training relies on a large-scale fully synthetic dataset. The model can plan and execute advanced tasks (such as booking restaurants, applying for jobs, planning)
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned