What model is the best at poker?


Benchmarks are great, but they're not fun, I wanted to put models in head-to-head competition
Background: a few weekends ago I built an agent poker engine and wanted to see which agent was better - Hermes or OpenClaw
Hermes won the first match, then I had them play 100 matches (not hands) of heads up Texas Hold'em
The result? Exactly 50-50, neither is decisively better out of the box
I used a variety of models across the 100 matches to mix it up and noticed some trends, so last night I ran a tournament to see which MODEL was best at poker
Here's how it worked:
> 8 models
> model vs model in heads up play
> best-of-7 series to determine winner
> each match played until either one model was bankrupt or 100 hands were played
After the first round:
> GPT-5.5 (#1 seed) beat Qwen 3.6 (#8 seed) 4-0
> Opus 4.7 (#2 seed) beat GLM-5.1 (#7 seed) 4-1
> Kimi K2.6 (#6 seed) beat Grok 4.3 (#3 seed) 4-3
> Gemini 3.1 (#4 seed) beat DeepSeek V4 (#5 seed) 4-2
No real surprises, and the one "upset" with Kimi beating Grok went the full 7 matches
Moving onto the semis today
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned