Microsoft releases Fara-7B, a multimodal intelligent agent with 7 billion parameters, designed specifically for computer usage scenarios. It can process screenshots and text simultaneously, directly predicting parameterized thought chains and operational actions, built on Qwen 2.5-VL, with a 128k context window, trained on 64 H100s for 2.5 days, and released under MIT license. It perceives browser input through screenshots, combining reasoning and historical state prediction to determine the next actions and parameters such as coordinates, relying on large-scale fully synthetic data. It has the capability to plan and execute advanced tasks, employing robust post-training safety alignment, able to refuse inappropriate tasks and pause at critical points. It can be deployed and interacted with via GitHub, vllm, and fara-cli, for automating web tasks.

MeNews

2026-05-26 18:47:37

Abstract generation in progress

AIMPACT News, May 16 (UTC+8), Microsoft released Fara-7B, its first small language model with 7 billion parameters specifically designed for computer usage scenarios. The model adopts a multimodal decoder architecture, capable of receiving screenshot images and text context, directly predicting parameterized thought chains and operational actions. Built on Qwen 2.5-VL (7B), supporting a 128k context length, trained over 2.5 days on 64 H100 GPUs, and released under the MIT license on November 24, 2025. Fara-7B perceives browser input through screenshots, combining internal reasoning and historical state records to predict the next action and parameters (such as click coordinates). Training relies on large-scale fully synthetic datasets. The model can plan and execute advanced tasks such as booking restaurants, applying for jobs, and planning trips. In terms of safety alignment, it employs robust fine-tuning methods, has key point recognition capabilities, can refuse seven categories of policy-violating tasks, and pauses operations at critical stopping points like inputting personal information or completing purchases. Users can deploy and interact via GitHub repositories, vllm, and fara-cli tools, mainly applied to automated web tasks. (Source: InFoQ)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

12 Likes

Reward
12
5
Repost
Share

Comment

Add a comment

Pragmatists

· 3h ago

With only 7B parameters, inference costs are manageable, and small to medium teams can also participate.

View OriginalReply0

ReflectionsOnTheStreetCorner

· 6h ago

7B Multi-Modal Agent, Local Deployment Enthusiasts Are Thrilled

View OriginalReply0

YieldTuningFork

· 6h ago

Microsoft's open-source game is on full display, MIT license is really appealing

View OriginalReply0

OracleSkeptic

· 6h ago

Full synthetic data training is quite interesting; I’ve figured out how to close the data loop.

View OriginalReply0

TheProphetOfToast

· 6h ago

Built on Qwen 2.5-VL—China-made base models are really delivering!

View OriginalReply0

Trending Topics
View More
#
StockTradingChallengeUpTo17000U
16.22M Popularity
#
USStrikesIran
9.32M Popularity
#
GatePredictionMarketAddsSmartMoneyTracking
13.81M Popularity
#
InstitutionalCapitalRotatesFromBTCToHYPEAndXRP
14.33M Popularity
#
TradeCFDWinGold
3.08M Popularity

Pinned

Sitemap

Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B

Trending Topics

StockTradingChallengeUpTo17000U

USStrikesIran

GatePredictionMarketAddsSmartMoneyTracking

InstitutionalCapitalRotatesFromBTCToHYPEAndXRP

TradeCFDWinGold

Pinned