Microsoft releases Fara-7B, a multimodal intelligent agent with 7 billion parameters, designed specifically for computer usage scenarios. It can process screenshots and text simultaneously, directly predicting parameterized thought chains and operational actions, built on Qwen 2.5-VL, with a 128k context window, trained on 64 H100 chips for 2.5 days, released under MIT license. It perceives browser input through screenshots, combining reasoning and historical state prediction to determine the next action and parameters such as coordinates, relying on large-scale fully synthetic data. It has the ability to plan and execute advanced tasks, and employs robust post-training safety alignment, capable of refusing inappropriate tasks and pausing at critical points. It can be deployed and interacted with via GitHub, vllm, and fara-cli, for automating web tasks.

MeNews

2026-05-27 00:32:37

Abstract generation in progress

AIMPACT News, May 16 (UTC+8), Microsoft released Fara-7B, its first small language model with 7 billion parameters specifically designed for computer usage scenarios. The model adopts a multimodal decoder architecture, capable of receiving screenshot images and text context, directly predicting parameterized chains of thought and operational actions. Built on Qwen 2.5-VL (7B), supporting a 128k context length, trained over 2.5 days on 64 H100 GPUs, and released under the MIT license on November 24, 2025. Fara-7B perceives browser input through screenshots, combining internal reasoning and historical state records to predict the next action and parameters (such as click coordinates). Training relies on a large-scale fully synthetic dataset. The model can plan and execute advanced tasks such as booking restaurants, applying for jobs, and planning trips. For safety alignment, it employs robust fine-tuning methods, has key point recognition capabilities, can refuse seven categories of policy-violating tasks, and pauses operations at critical stopping points such as inputting personal information or completing purchases. Users can deploy and interact via GitHub repositories, vllm, and fara-cli tools, mainly applied to automated web tasks. (Source: InFoQ)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

10 Likes

Reward
10
11
Repost
Share

Comment

Add a comment

MintCondition

· 3h ago

Post-training safety alignment + key point pause, this design approach clearly reflects lessons learned

View OriginalReply0

DepegDaydream

· 3h ago

Full synthetic data training creates a closed-loop for the data, so the cost of subsequent iterations will continue to decrease.

View OriginalReply0

BlueberryStakingMachine

· 4h ago

Handling both screenshots and text simultaneously, multimodality is finally not just a gimmick but a necessity.

View OriginalReply0

LatencyMonk

· 4h ago

Training 64 H100s for 2.5 days—this cost efficiency is lower than I expected.

View OriginalReply0

BridgeAnxiety

· 5h ago

Predicting coordinates and parameters is too critical; previously, with GPT-4V, I still had to do post-processing myself.

View OriginalReply0

YieldBento

· 5h ago

fara-cli directly interacts via command line, making tech enthusiasts ecstatic. I'll try it tomorrow.

View OriginalReply0

BluePeonyDoesn'tDrop

· 5h ago

Able to refuse violations and proactively pause, this safety alignment is more meticulous than some closed-source models.

View OriginalReply0

PurpleMistLily

· 5h ago

With 128k context and screenshot awareness, browser automation finally no longer requires writing a bunch of XPath.

View OriginalReply0

LonelyStoneUnderTheAurora

· 5h ago

MIT License means commercial use and modifications are allowed; domestic shell companies are ready.

View OriginalReply0

IdleFishDaoMember

· 5h ago

Qwen 2.5-VL base module + fully synthetic data—synthetic data pipelines are becoming increasingly mainstream.

View OriginalReply0

Trending Topics
View More
#
StockTradingChallengeUpTo17000U
16.23M Popularity
#
TrumpBacksCFTCAuthorityOverPredictionMarkets
816.83K Popularity
#
GatePredictionMarketAddsSmartMoneyTracking
13.2M Popularity
#
MicronMarketCapBreaks1Trillion
36.58K Popularity
#
TradeCFDWinGold
3.08M Popularity

Pinned

Sitemap

Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B

Trending Topics

StockTradingChallengeUpTo17000U

TrumpBacksCFTCAuthorityOverPredictionMarkets

GatePredictionMarketAddsSmartMoneyTracking

MicronMarketCapBreaks1Trillion

TradeCFDWinGold

Pinned