Google has just made a very strategic move at the Cloud Next conference in Las Vegas. They launched two new eighth-generation TPU processors simultaneously, marking an important turning point in how the company views the AI market.



For the first time, Google completely separated training and inference chips. There is the TPU 8t focused on training AI models, and the TPU 8i optimized to run these models in production. This is a clear tactical shift—they recognize that these two tasks have very different needs.

The performance numbers are interesting. The TPU 8t offers 124% more efficiency per watt compared to the previous generation, while the TPU 8i has a 117% improvement. Compared to Ironwood released in November, the TPU 8t delivers 2.8 times more performance for the same price, and the 8i is 80% more efficient. These gains are not trivial.

What stands out most about the TPU 8t is its ability to scale up to 9,600 units in a single system. Google is well aware that in installations of this size, energy has become the critical limiting factor for data centers. That’s why energy efficiency has become the top priority.

The TPU 8i takes a different approach. Each chip carries 384MB of SRAM—three times more than Ironwood. This makes perfect sense for inference, where the chip needs to process multiple reasoning steps without constantly fetching data from outside. Ideal for running complex AI agents.

Both processors are set to hit the market at the end of 2026. Sundar Pichai, CEO of Alphabet, made it clear that the architecture was designed to “run millions of agents simultaneously in a cost-effective way.” That’s the point—it's not just about having better chips, but doing so without breaking the budget.

On the software side, Google launched the Gemini Enterprise Agent Platform with new features. Memory Bank and Memory Profile allow agents to remember past interactions with users—solving a real problem that older tools had. There’s also Agent Simulation to better test before deployment.

The Projects platform integrates data from Workspace, OneDrive, and corporate chats, providing context to agents. Additionally, they launched Gemini Enterprise for regular employees, positioning it as a “AI assistant for everyone,” without the need to write code.

All of this is a dual attack—hardware and software—against Nvidia, OpenAI, and Anthropic. Google is well aware that Valley engineers often switch between Anthropic’s Claude and OpenAI’s Codex for AI development, rarely considering Google’s tools. This clearly bothers leadership.

TPU adoption is accelerating. Citadel Securities has already built quantitative software on Google’s TPU. The US Department of Energy’s 17 national laboratories use collaborative tools based on TPU. Meta signed a long-term agreement to use Google’s TPU, and Anthropic committed to gigawatt-scale computational capacity.

DA Davidson analysts estimate that the combined value of Google’s TPU and DeepMind businesses exceeded $900 billion last September.

Interestingly, Google did not directly compare its new TPU with Nvidia products. Meanwhile, Nvidia is about to launch a new line incorporating Groq technology, which it acquired for $20 billion, specifically aimed at ultra-low latency inference. Jensen Huang of Nvidia stated that more than 20% of AI workloads could be better handled by this type of chip.

Google is testing implementing TPU in customer data centers and promoting compatibility with third-party tools. But supply chain bottlenecks and the mismatch between rapid model iterations and chip development cycles that take years remain real challenges to scaling.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments