Google’s two AI giants pull off a turnaround in a tough matchup—TERMS-Bench turns AI negotiations into bankruptcy stress tests

robot
Abstract generation in progress

According to Beating Monitoring, Stanford’s Erica Zhang and others released the economic negotiation test set TERMS-Bench.
It removes the black-box “large model judge,” allowing evaluators to directly see whether the model loses due to bidding, concessions, or violations.
In standard tests, Claude Opus 4.6 and Zhipu GLM 5.1 took the top two spots.
The paper found that they adopted a tough strategy of “high bids, no concessions,” which can drain opponents in profitable, favorable situations.
But in the highest difficulty scenarios with extremely narrow profit margins, tough strategies suffer because negotiations frequently break down.
The leaderboard here directly crashes: Gemma 4 31B (an open-weight model) and Gemini 3.1 Pro, which understand moderate concessions to secure deals, leap ahead to the top two;
Meanwhile, the previous leaders Claude drops to fifth place, and GLM drops to ninth.
Besides testing the extreme difficulty, the most impactful aspect of this benchmark is testing survival ability with the Bankroll mode.
A single negotiation is extended into continuous procurement: each agent starts with $100 and negotiates 50 rounds, with fixed operational costs deducted each round, going bankrupt if funds run out.
Here, even tiny negotiation mistakes compound into bankruptcy risk.
Results show that the aforementioned GLM 5.1, Claude Opus 4.6, and Google’s duo, despite different strategies, all dominate in control ability, achieving 100% survival, with final cash holdings between $380 and $443.
In contrast, Grok 4.20 and GPT-4o-mini cannot withstand cash flow losses, with bankruptcy rates of 25% and 50%, respectively.
The key of TERMS-Bench is not the success rate, but translating negotiation errors into cash losses and bankruptcy risks.
Whether a model can persuade the opponent is just the first layer;
In continuous trading, whether it can maintain profit and cash flow is what truly makes the difference.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned