Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B

robot
Abstract generation in progress
AIMPACT News, May 16 (UTC+8), Microsoft released Fara-7B, its first small language model with 7 billion parameters specifically designed for computer usage scenarios. The model adopts a multimodal decoder architecture, capable of receiving screenshot images and text context, directly predicting parameterized thought chains and operational actions. Built on Qwen 2.5-VL (7B), supporting a 128k context length, trained over 2.5 days on 64 H100 GPUs, and released under the MIT license on November 24, 2025. Fara-7B perceives browser input through screenshots, combining internal reasoning and historical state records to predict the next action and parameters (such as click coordinates). Training relies on large-scale fully synthetic datasets. The model can plan and execute advanced tasks such as booking restaurants, applying for jobs, and planning trips. For safety alignment, it employs robust fine-tuning methods, has key point recognition capabilities, can refuse seven categories of policy-violating tasks, and pauses operations at critical stopping points like inputting personal information or completing purchases. Users can deploy and interact via GitHub repositories, vllm, and fara-cli tools, mainly applied to automated web tasks. (Source: InFoQ)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 10
  • 3
  • Share
Comment
Add a comment
Add a comment
GateUser-16838403
· 51m ago
A 2.5-day training cycle, Microsoft's efficiency is a bit terrifying.
View OriginalReply0
GateUser-53a6e1a8
· 5h ago
Safe alignment can refuse to perform violations, which is more reliable than AutoGPT in this regard.
View OriginalReply0
TheBluePeony'sProphecy
· 5h ago
Qwen 2.5-VL has a solid foundation, but the multi-modal Agent track is going crazy.
View OriginalReply0
SeaSaltFlavorAirdrop
· 5h ago
In web automation, the Frankenstein patchwork of Playwright + LLMs is putting its creators out of a job.
View OriginalReply0
GateUser-4bd1cc87
· 5h ago
MIT License is well-received, 7B parameters can now run locally
View OriginalReply0
GlassCityAfterTheRain
· 5h ago
Is deploying fara-cli simple? Is there a Docker image available?
View OriginalReply0
GateUser-8da82d63
· 5h ago
Training on fully synthetic data, generalization ability is questionable, awaiting actual testing.
View OriginalReply0
LateAlphaCourier
· 5h ago
128k context should be enough for me to fit the entire webpage inside.
View OriginalReply0
AirdropUnderTheNeonBridge
· 5h ago
Screenshot + text directly predict coordinates, browser automation is about to change.
View OriginalReply0
CandleChaser
· 5h ago
Running 64 H100s for two and a half days, I can't even calculate the cost.
View OriginalReply0
View More