Microsoft releases the first 7B-parameter computer-controlled intelligent agent model Fara-7B
Microsoft releases Fara-7B, a multimodal intelligent agent with 7 billion parameters, designed specifically for computer usage scenarios. It can process screenshots and text simultaneously, directly predicting parameterized thought chains and operational actions, built on Qwen 2.5-VL, with a 128k context window, trained on 64 H100 chips for 2.5 days, released under MIT license. It perceives browser input through screenshots, combining reasoning and historical state prediction to determine the next action and parameters such as coordinates, relying on large-scale fully synthetic data. It has the ability to plan and execute advanced tasks, and employs robust post-training safety alignment, capable of refusing inappropriate tasks and pausing at critical points. It can be deployed and interacted with via GitHub, vllm, and fara-cli, for automating web tasks.