Predicting World Cup knockout matches, different AI levels differ so much?

Original Title: “World Cup Knockout Predictions: Do Different AI Levels Differ That Much?”
Original Author: Asher, Odaily Planet Daily

Every World Cup match, before it starts, I ask AI to make a prediction. Almost every model speaks with confidence, full of details.

Some talk about teams’ market values, some break down group-stage data, some analyze injuries and tactics, and some even lay out scorelines, extra-time scenarios, and penalty-shootout scripts right away. At first glance, ChatGPT, Grok, Qianwen, DeepSeek, Gemini, and Claude all seem to “know football.”

But as a prediction-market user, what I truly care about isn’t which model sounds more complete—it’s which one is more worth referencing.

As the World Cup moves into the knockout stage, Odaily Planet Daily, starting from the first match, asked different AI models with as similar questions as possible before the games, and then compared them against the real results afterward—seeing which models just analyzed like they knew what they were doing, and which models actually caught the direction of the matches in advance.

So far, in the knockout matches that have already finished: Canada beat South Africa 1:0 in a dramatic upset; Brazil narrowly defeated Japan 2-1; Germany was eliminated after being dragged into a penalty shootout by Paraguay; and the Netherlands also fell to Morocco on penalties. Then in Belgium vs. Senegal, the match even turned into a 2-2 draw and was reversed in extra time—pushing the uncertainty of the knockout stage to the max.

DeepSeek and Gemini: Legendary status by predicting Morocco in that one match

The most memorable so far is still DeepSeek and Gemini’s predictions for the Netherlands vs. Morocco match. Before the game, it was actually easy to back the wrong side. On paper, the Netherlands were stronger and their squad was more complete. Many models knew Morocco would be tough to play against, but in the end they still chose to believe the Netherlands would get through.

What’s impressive about DeepSeek and Gemini is that they didn’t stop at “this match will be tense.” Instead, they wrote out the script that came after it, too. Gemini even directly gave a 1-1 score in regular time, with Morocco winning on penalties. And the match really played out as 1-1; in the end, Morocco won the penalty shootout 3-2 and eliminated the Netherlands. This wasn’t just guessing the direction correctly—it was basically matching how the match would be dragged into penalties, and who would be the one laughing at the end.

Gemini’s prediction for the Netherlands vs. Morocco match

DeepSeek was very close as well. It judged that the match would most likely end 1-1 or 0-0 in regular time, with the possibility of it going all the way to extra time and even penalties, and it leaned toward Morocco pulling off an upset through defense and counterattacks.

DeepSeek’s prediction for the Netherlands vs. Morocco match

After this match, DeepSeek and Gemini’s presence suddenly skyrocketed. Especially Gemini—this time it felt less like making a pre-match prediction and more like it had already seen the game script.

Grok and Qianwen hit specific scorelines in a row, with stability stronger than expected

Beyond the standout performance from DeepSeek and Gemini in Morocco’s match, Grok and Qianwen also weren’t without impact. Their most striking strength is that in some matches where the winning direction is relatively clear, they didn’t just predict the teams that advanced—they also forecast the specific scorelines fairly close to the final results.

South Africa vs. Canada is a clear example. Before the match, most AI models favored Canada, but the disagreement was whether Canada would win easily. Grok predicted Canada would win 1-0, and Qianwen also predicted a one-goal win. In the end, Canada really did advance with just 1 goal—it didn’t turn into the big win some had imagined.

Qianwen’s prediction for South Africa vs. Canada

Brazil vs. Japan is similar. Most AI models thought Brazil was stronger, but whether Japan could keep the game under control was the key. Both Grok and Qianwen predicted a 2-1 scoreline—and the match actually ended with Brazil winning 2-1. What they got right wasn’t only that “Brazil will win,” but that Japan would be able to cause Brazil enough trouble.

In Côte d’Ivoire vs. Norway, both sides also nailed things relatively well. Norway had Haaland, so it wasn’t hard to understand the path to advancement, but Côte d’Ivoire’s physical battles and flank pressure wouldn’t let the match become one-sided. Both Grok and Qianwen predicted Norway would win 2-1, and the final score landed exactly within that “script.”

Grok’s prediction for Côte d’Ivoire vs. Norway

The advantage of Grok and Qianwen is that they look at favored matchups in more detail. They didn’t write out a huge script like Morocco eliminating the Netherlands in advance, but in matches involving Canada, Brazil, Norway, and France, they gave comparatively solid forecasts for both the outcome direction and the scoreline. In other words, they might not be the best at catching upsets, but they’re quite good at judging whether a top team will crush through or struggle to eke out a narrow win.

ChatGPT didn’t have many perfect score predictions, but its analysis of the match process was more accurate

ChatGPT didn’t predict Morocco eliminating the Netherlands on penalties the way Gemini did, and it also didn’t land multiple specific scorelines the way Grok and Qianwen did. But its strength is this: in many matches that looked like the stronger teams would dominate going in, ChatGPT was more likely to clearly remind people that the game might not be that easy.

Brazil vs. Japan is a case in point. ChatGPT predicted Brazil would advance, but it didn’t frame it as a comfortable win. Instead, it pointed out that Japan’s pressure, movement, and discipline would make Brazil uncomfortable—meaning Japan could even score first or manage to equalize. Côte d’Ivoire vs. Norway is similar: ChatGPT predicted Norway would advance, but it also said it wouldn’t be an easy match, because Côte d’Ivoire’s physicality, flank pressure, and transition play would create problems.

ChatGPT’s prediction for England vs. DR Congo

ChatGPT’s strength isn’t that it gets the score exactly right every time, but that it often identifies where the resistance in a match will come from. It’s good for understanding a match, but if all you need is a prediction of a single final score, it may not be the best choice. It can describe the process fairly accurately, but when it comes to writing out a big upset, it’s still missing a bit of decisiveness.

Germany is out—and it became a collective failure scene for AI models

If the earlier matches still made it possible to see the individual highlights of different models, then Germany vs. Paraguay was a collective failure.

Before the match, all AI models were on Germany’s side. ChatGPT, Grok, Qianwen, Gemini, and Claude all predicted a Germany win, with most score predictions clustering around 2-0, 3-0, or 3-1. The reasoning was consistent: they all believed Germany had stronger “on paper” quality, better squad depth, and more attacking firepower.

But the result went wrong. The AI models underestimated Paraguay’s ability to drag the match into the mud. Germany couldn’t settle it in regular time, couldn’t break the deadlock in extra time, and in the end Paraguay dragged it into a penalty shootout and eliminated Germany.

Who’s the most accurate so far?

From the knockout matches already completed, the characteristics of different models are starting to become clear.

DeepSeek and Gemini have the biggest highlights. They can not only predict favorites like Brazil and France advancing, but they also provide high-value answers in tougher upset matchups. In the Netherlands vs. Morocco match, their key advantage was that they were willing to predict Morocco’s upset and penalty-shootout script in advance. Especially Gemini—it directly predicted Morocco would advance on penalties, and that was truly impressive.

Grok and Qianwen are more like “scoreline players.” They hit quite a few specific scorelines, and they performed well in matches like Canada, Brazil, Norway, and France. But the issue is that when it comes to traditional powerhouses like Germany and the Netherlands, they ultimately still lean toward the favorite.

ChatGPT and Claude are more like “analytical players.” Their reasoning is more complete, most of their predictions aren’t off the mark, and they can also remind people about extra-time risks. But their problem is that they often can see that a match is hard to play, yet they don’t really dare to write the conclusion on the upset side. The Netherlands vs. Morocco match is a good example—they already saw the risk of extra time and penalties, yet in the end they still trusted the Netherlands more.

So rather than rushing to ask which model knows football best, it’s better to look at what scenarios each one is best suited for.

Original Link

Click to learn about BlockBeats job openings

Welcome to join the official BlockBeats community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Twitter Official Account: https://twitter.com/BlockBeatsAsia

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned