Google's TurboQuant paper sparks academic controversy, with the team responding that the core technology is an industry standard and not sufficiently respecting RaBitQ. Despite claiming innovation and experimental validity, external doubts about its academic ethics remain. This matter could affect TurboQuant's publication in top conferences and industry trust.

GateUser-bd883c58

2026-04-15 11:42:45

Abstract generation in progress

Can AI TurboQuant’s Technical Clarification Calm Academic Controversy?

On April 1st, after nearly a week of silence, Google’s team behind the controversial compression algorithm TurboQuant finally responded. However, this latest “technical clarification” still seems insufficient to quell the controversy. Concerning the allegations of “core technical similarity,” Google argued that random rotation is a standard technique and claimed that errors in the experimental benchmarks are “not significant” to the facts.

In the last week of March, this paper, heavily promoted on Google’s official blog, single-handedly caused a plunge in global storage chip stocks, with Micron, SK Hynix, Samsung Electronics, and others losing over $90 billion in market value. The paper claimed that TurboQuant’s compression algorithm could reduce KV cache memory usage of large language models by at least six times, increase speed by up to eight times, and do so with zero loss of accuracy.

Wall Street’s panic was rooted in the idea that if software can compress AI memory requirements sixfold, the hardware growth logic would need to be rewritten.

However, the reversal came quickly. On March 27th, Gai Jianyang, author of RaBitQ and a postdoctoral researcher at ETH Zurich, published a lengthy article on Zhihu accusing Google’s team of systemic academic misconduct. Public opinion quickly shifted toward questioning Google’s academic integrity.

The industry generally agrees that RaBitQ was the first to propose an original method, and TurboQuant optimized it based on that, but without proper citation or respect, and even made unjustified dismissals.

On April 1st, in response to external accusations, the second author of the paper, Majid Daliri, finally stepped forward and posted a four-point “technical clarification” on the OpenReview platform.

Regarding the novelty of the core technology, Google argued that TurboQuant’s core method is not derived from RaBitQ. Because “random rotation is a standard, ubiquitous technique in quantization literature,” and has been widely used long before RaBitQ appeared. The real innovation of TurboQuant lies in deriving the distribution of coordinates after rotation.

But academic norms dictate that if someone is the first to apply a “wheel” to a “car” and build a complete vehicle, subsequent builders citing and acknowledging that work is basic academic courtesy. Google downplays the prior work as industry common knowledge, effectively diminishing the contributions of the pioneers.

Second, regarding the accusation that TurboQuant’s theory was dismissed as “suboptimal,” the authors admitted that they had overlooked a constant factor in the appendix, leading to a hasty conclusion that “initially, we honestly described the method as suboptimal.” After careful review, they now recognize that RaBitQ is indeed optimal, and the team is updating the TurboQuant manuscript.

However, basing a negative evaluation of a core theoretical contribution in a top conference paper on “not reading the appendix carefully” invites skepticism about the explanation’s strength.

Third, addressing the accusation of “rigging the race” by tying up opponents’ hands, Majid Daliri directly pointed out that even if the comparison with RaBitQ during runtime is omitted, the scientific impact and validity of the paper remain largely unchanged. Because TurboQuant’s main contribution is the trade-off in compression quality, not specific acceleration.

Previously, Gai Jianyang disclosed in an open letter that the Google team tested RaBitQ using a single-core CPU with multi-threading disabled, while testing TurboQuant on an NVIDIA A100 GPU. Although the team claimed speed was not the core focus, the paper still highlighted speed as one of its key selling points.

Finally, Google hinted at malicious intent in its response, noting that the paper was posted on arXiv as early as April 2025, giving the other side nearly a year to raise issues through academic channels. Yet, they only escalated after the paper gained widespread attention.

According to Gai Jianyang’s earlier responses, the two sides communicated privately via email as early as May 2025, and in November 2025, they contacted the ICLR organizers, but received no effective response. It was only when Google promoted the paper through official channels to massive exposure that academic correction became urgent.

On OpenReview, some researchers commented that this is a serious issue deserving more attention. “It’s frustrating to see those engaged in fundamental research being ignored, while large, influential organizations promote their own成果.” In this sense, it feels less like science and more like a PR contest with big tech companies.

Meanwhile, reviewers of the TurboQuant paper also expressed their stance, stating that they had given high praise for the paper’s theoretical analysis and experimental results.

“But I also explicitly pointed out that both RaBitQ and TurboQuant use random rotation, and asked the authors of TurboQuant to compare how the design differences between TurboQuant and RaBitQ affect performance.” The reviewer noted that proper academic practice would be to thoroughly discuss the differences between RaBitQ and TurboQuant in the paper, but surprisingly, “RaBitQ was only mentioned once in the experimental section of the main paper.”

Undeniably, TurboQuant has commercial potential at the technical level. An AI master’s analysis on Zhihu pointed out that in large model inference scenarios, KV cache memory usage directly determines how many requests a single card can handle simultaneously, which is a core economic indicator for inference service providers. The same GPU, if concurrency is increased sixfold, could theoretically reduce inference costs per request to one-sixth. For AI companies handling billions of API calls daily, this would be a significant cost-saving tool, which explains the recent stock market turbulence.

Google’s paper is scheduled to be published at the top machine learning conference ICLR 2026 at the end of April, but it seems the team must first get through this academic controversy. How the storm will ultimately resolve remains to be seen.

(This article is from First Financial)

Google issues another "technical clarification," causing controversy over the paper that crashed global storage stocks

Trending Topics

GatePreIPOsLaunchesWithSpaceX

GateMarchTransparencyReport

GoldmanSachsFilesBitcoinIncomeETF

USBlocksStraitofHormuz

WCTCTradingChallengeShare8MUSDT

Pin