With just a 7B parameter model, you can control the browser—MIT license, straight open-sourced. Microsoft’s move is a bit ruthless.

View Original
MeNews
Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B
Microsoft releases Fara-7B, a multimodal intelligent agent with 7 billion parameters, specifically designed for computer usage scenarios. It can process screenshots and text simultaneously, directly predicting parameterized chains of thought and operational actions, built on Qwen 2.5-VL, with a 128k context window, trained on 64 H100s for 2.5 days, and released under MIT license. It perceives browser input through screenshots, combining reasoning and historical state prediction to determine the next action and parameters such as coordinates, relying on large-scale fully synthetic data. It has the ability to plan and execute advanced tasks, and employs robust post-training safety alignment, capable of refusing inappropriate tasks and pausing at critical points. It can be deployed and interacted with via GitHub, vllm, and fara-cli, for automating web page tasks.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned