Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Google’s two AI giants pull off a turnaround in a tough matchup—TERMS-Bench turns AI negotiations into bankruptcy stress tests
According to Beating Monitoring, Stanford’s Erica Zhang and others released the economic negotiation test set TERMS-Bench.
It removes the black-box “large model judge,” allowing evaluators to directly see whether the model loses due to bidding, concessions, or violations.
In standard tests, Claude Opus 4.6 and Zhipu GLM 5.1 took the top two spots.
The paper found that they adopted a tough strategy of “high bids, no concessions,” which can drain opponents in profitable, favorable situations.
But in the highest difficulty scenarios with extremely narrow profit margins, tough strategies suffer because negotiations frequently break down.
The leaderboard here directly crashes: Gemma 4 31B (an open-weight model) and Gemini 3.1 Pro, which understand moderate concessions to secure deals, leap ahead to the top two;
Meanwhile, the previous leaders Claude drops to fifth place, and GLM drops to ninth.
Besides testing the extreme difficulty, the most impactful aspect of this benchmark is testing survival ability with the Bankroll mode.
A single negotiation is extended into continuous procurement: each agent starts with $100 and negotiates 50 rounds, with fixed operational costs deducted each round, going bankrupt if funds run out.
Here, even tiny negotiation mistakes compound into bankruptcy risk.
Results show that the aforementioned GLM 5.1, Claude Opus 4.6, and Google’s duo, despite different strategies, all dominate in control ability, achieving 100% survival, with final cash holdings between $380 and $443.
In contrast, Grok 4.20 and GPT-4o-mini cannot withstand cash flow losses, with bankruptcy rates of 25% and 50%, respectively.
The key of TERMS-Bench is not the success rate, but translating negotiation errors into cash losses and bankruptcy risks.
Whether a model can persuade the opponent is just the first layer;
In continuous trading, whether it can maintain profit and cash flow is what truly makes the difference.