Fara-7B directly predicts click coordinates using screenshots and text, effectively equipping AI with eyes and hands. The MIT license open source is a game-changer.

View Original
MeNews
Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B
Microsoft releases Fara-7B, a multimodal intelligent agent with 7 billion parameters, designed specifically for computer usage scenarios. It can process screenshots and text simultaneously, directly predicting parameterized thought chains and operational actions, built on Qwen 2.5-VL, with a 128k context window, trained for 2.5 days on 64 H100 units, and released under MIT license. It perceives browser input through screenshots, combining reasoning and historical state prediction to determine the next actions and parameters such as coordinates, relying on large-scale fully synthetic data. It has the capability to plan and execute advanced tasks, and employs robust post-training safety alignment, able to refuse inappropriate tasks and pause at critical points. It can be deployed and interacted with via GitHub, vllm, and fara-cli, for automating web tasks.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned