Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Microsoft open-sources Phi-Ground: 4 billion parameters achieve click accuracy better than Operator and Claude
According to Beating Monitoring, Microsoft has open-sourced the Phi-Ground model family, which is designed specifically to solve the problem of “where on the screen” when AI controls a computer. With a screenshot and an instruction, the model outputs precise click coordinates. After pairing the open-source 4 billion parameter version with a large model for instruction planning, it achieved click accuracy above OpenAI Operator and Claude Computer Use on the Showdown benchmark, and took first place across all five evaluations, including ScreenSpot-Pro, for models below 4 billion parameters.
The team conducted large-scale validation using more than 40 million data points and found that the three training techniques commonly used in previous academic papers all failed when the dataset size was increased. The truly effective method is simple: output coordinates as ordinary numbers directly, such as “523, 417.” Several prior papers invented a special vocabulary of position words for coordinates, hoping the model would speak coordinates the way it speaks words, but with large-scale training these new tokens could not be learned properly and instead caused the model to collapse. Another key is to input the text instruction before the image. Large models process information in one direction: once they read “click the blue settings icon” and only then look at the image, when handling pixels they already know what to look for; conversely, if they look at the image first, the model can only scan blindly, resulting in much worse performance.
The team also found that reinforcement learning is useful even for purely visual tasks. The specific approach is to have the model make multiple click predictions on the same image, then select correct versus incorrect point outcomes for comparison-based training (this method is called DPO, which is a type of reinforcement learning). Even after the model has already been thoroughly fine-tuned, this step can still significantly improve accuracy. Previously, reinforcement learning was typically used only for language tasks that require reasoning, and it was an unexpected gain that it also works for perceptual tasks of the kind “look at the picture and click where to go.” To address the problem that buttons are too small on 4K high-resolution screens (a button may occupy only 0.07% of the screen area), the team resized the screenshots proportionally during training and pasted them onto a large white canvas to simulate real scenarios where elements are extremely tiny on high-resolution displays. This trick is especially effective in complex professional software such as Photoshop.