Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Tens of millions of errors per hour, investigation reveals the "illusion of accuracy" in Google AI search
Author: Claude, Deep Tide TechFlow
Deep Tide Guide: A recent test conducted jointly by The New York Times and AI startup Oumi shows that Google’s AI summary feature (AI Overviews) has an accuracy rate of about 91%. But when you translate that against the scale of Google processing 5 trillion searches per year, it means tens of millions of incorrect answers are generated every hour. Even more challenging is that even when the answers are correct, more than half of the cited links fail to support its conclusions.
Google is delivering wrong information to users on an unprecedented scale, and most people are completely unaware.
According to The New York Times, AI startup Oumi, acting on its behalf, used SimpleQA—an industry-standard test developed with OpenAI—to conduct an accuracy assessment of Google’s AI Overviews feature. The test covered 4,326 search queries, with two rounds run last October (driven by Gemini 2) and this February (after upgrading to Gemini 3). The results show that Gemini 2 had an accuracy rate of about 85%, and Gemini 3 increased it to 91%.
91% sounds good, but when you put it in Google’s volume, it’s a different story. Google handles about 5 trillion search queries every year. With a 9% error rate, AI Overviews produces more than 57 million inaccurate answers every hour—nearly 1 million per minute.
The answers are right, but the sources are wrong
More troubling than the accuracy rate is the “citation detachment” problem of the sources.
According to Oumi’s data, in the Gemini 2 era, 37% of correct answers had the issue of “unfounded citations,” meaning the links attached to the AI summaries did not support the information it provided. After upgrading to Gemini 3, this proportion did not decrease—it jumped to 56%. In other words, while the model gives correct answers, it’s becoming increasingly less likely to “hand in its homework.”
Oumi CEO Manos Koukoumidis’s challenge goes straight to the point: “Even if the answer is correct, how do you know it’s correct? How do you verify it?”
AI Overviews’ heavy reliance on low-quality sources exacerbates the problem. Oumi found that Facebook and Reddit are the second- and fourth-most cited sources in AI Overviews, respectively. Among inaccurate answers, Facebook is cited as often as 7%, higher than 5% in accurate answers.
A fake article by a BBC reporter successfully “poisoned” the system within 24 hours
Another serious flaw of AI Overviews is that it is extremely easy to manipulate.
A BBC reporter tested it with a deliberately fabricated false article. In less than 24 hours, Google’s AI summaries presented the false information to users as fact.
This means that anyone familiar with how the system works could “poison” AI search results by publishing false content and boosting its traffic. Google spokesperson Ned Adriance’s response was that search AI features are built on the same ranking and safety mechanisms used to block spam, and he added that “most of the examples in the test are unrealistic queries that people would actually never search.”
Google pushes back: The test itself has problems
Google raised multiple questions about Oumi’s research. A Google spokesperson said the study “has serious flaws,” citing reasons including: the SimpleQA benchmark test itself includes inaccurate information; Oumi uses its own AI model HallOumi to evaluate another AI’s performance, which could introduce additional errors; and the test content does not reflect users’ real search behavior.
Google’s internal testing also showed that when Gemini 3 runs independently outside Google’s search framework, the rate of producing false outputs can be as high as 28%. But Google emphasized that AI Overviews uses Google’s search ranking system to improve accuracy and performs better than the model itself.
However, as PCMag pointed out, there’s a logical paradox: if your defense is that “the report pointing out our AI’s inaccuracy also used AI that may be inaccurate,” that likely won’t increase users’ confidence in the accuracy of your product.