Asked GPT Image 2.0 to create a benchmark table of opus 4.7 vs gpt 5.5.


That image model got really good.
GPT-5.5 wins the headline scoreboard. But look closer.
OSWorld 78.7 vs 78.0. GDPval 84.9 vs 80.3. Toolathlon 55.6 vs 54.6 (over 5.4, not Opus).
Opus still takes SWE-Bench Pro, MCP Atlas, GPQA Diamond, HLE no-tools.
OpenAI gets the all-around belt. Anthropic keeps the coding crown. On paper.
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin