Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B

robot
Abstract generation in progress
AIMPACT News, May 16 (UTC+8), Microsoft released Fara-7B, its first small language model with 7 billion parameters specifically designed for computer usage scenarios. The model adopts a multimodal decoder architecture, capable of receiving screenshot images and text context, directly predicting parameterized thought chains and operational actions. Built on Qwen 2.5-VL (7B), supporting a 128k context length, trained over 2.5 days on 64 H100 GPUs, and released under the MIT license on November 24, 2025. Fara-7B perceives browser input through screenshots, combining internal reasoning and historical state records to predict the next action and parameters (such as click coordinates). Training relies on a large-scale fully synthetic dataset. The model can plan and execute advanced tasks such as booking restaurants, applying for jobs, and planning trips. For safety alignment, it employs robust fine-tuning methods, has key point recognition capabilities, can refuse seven categories of policy-violating tasks, and pauses operations at critical stopping points like inputting personal information or completing purchases. Users can deploy and interact via GitHub repositories, vllm, and fara-cli tools, mainly applied to automated web tasks. (Source: InFoQ)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • 3
  • Share
Comment
Add a comment
Add a comment
BoredInBlockspace
· 4m ago
128k context length is indeed enough for web automation, and for long-running tasks you won’t have to worry about forgetting the earlier context.
View OriginalReply0
MintConditionHuman
· 6h ago
The browser automation track is becoming more competitive, and after AutoGPT, there's another contender.
View OriginalReply0
BlocktimeBarista
· 6h ago
Predicting coordinates is quite important; previously, many models had embarrassingly poor accuracy in locating elements.
View OriginalReply0
RugCheckSkeptic
· 6h ago
Will models trained on fully synthetic data fail when generalized to real complex pages?
View OriginalReply0
QuietValidator
· 6h ago
MIT License is highly praised; finally, no need to look at those commercial restriction clauses.
View OriginalReply0
ColdWalletUnderTheNeonLights
· 6h ago
How’s the deployment experience with fara-cli? Have any brothers who tried it share what pitfalls they ran into?
View OriginalReply0
LateBlockLarry
· 6h ago
64 H100 units trained in 2.5 days—this efficiency is something else. Synthetic data is doing a great job.
View OriginalReply0