GLM-5.2 is the benchmark king.


It's the first open-weight model to take #1 in multiple categories (and it's beating frontier models across the board).
#1 Wins:
→ Design Arena: ~1360 Elo, first open-weight model to take #1, beats Fable 5 by ~10 Elo
→ Terminal-Bench 2.1: 81.0% (best run 82.7%), first open model to cross 80%
→ Artificial Analysis Intelligence Index v4.1: Top open-weight model, score 51
→ GDPval-AA v2: Leading open-weight, competitive with/ahead of GPT-5.5
→ LiveBench Agentic Coding: #1–2 open overall
Top-3 Rankings:
→ FrontierSWE (Dominance): #3 overall, 74.4% (near-tie with Opus 4.8's 75.1%, beats GPT-5.5)
→ SWE-bench Pro: #1 among open models, 62.1% (beats GPT-5.5's 58.6%)
→ MCP-Atlas (tool usage): ~77.0, near/in top 3
→ Humanity's Last Exam (with tools): ~54.7, beats GPT-5.5
→ BenchLM leaderboard: #3–4 of 124 models
→ Code/Agent Arena (Frontend): #2 overall, behind only Fable →PostTrainBench: #2 overall, behind Opus 4.8, beats GPT-5.5
Insane resume.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments