Huang Renxun GTC 2026 Reveals "Token Economics": Computing as Revenue, Nvidia Fully Mass Produces Vera Rubin, Taiwan's AI Demand Rockets

Huang Renxun reveals a new era of "Token Economics" at GTC Taipei 2026: AI data centers are shifting from hardware sales to a "computation as revenue" model, where each Token is a tradable, profit-generating asset unit. The Vera Rubin architecture is fully mass-produced, paired with Groq LPU decoupled inference, enabling a 1GW data center's annual revenue to soar from $30 billion to $300 billion. Huang also announces the activation of NVIDIA Constellation Taipei headquarters, as Taiwan's AI computing demand "rocket skyrockets."
(Background summary: Huang Renxun's GTC 2026 talks wildly about "Hardness"? Why LLM Agents need hardening—one sentence reveals the key to AI agent deployment)
(Additional context: Perspective — Will ChatGPT and Claude eliminate all jobs?)

Table of Contents

Toggle

  • Token as Revenue: The Business Equation of AI Factories
  • Vera Rubin Full Mass Production: Supply Chain Scale Doubles
  • Decoupled Inference: NVIDIA + Groq Create a "Token Dual Engine"

NVIDIA CEO Huang Renxun made a heavyweight declaration at GTC Taipei 2026 on June 1: "Tokens are assets; tokens have become profit-generating revenue units." He straightforwardly states that the business logic of the AI industry is flipping—shifting from selling GPU hardware to selling "computational output."

This keynote at Taipei Music Center, synchronized with COMPUTEX 2026, not only revisited key announcements from GTC San Jose but also used data to demonstrate: a 1GW AI data center, upgraded from Blackwell to Vera Rubin with Groq decoupled inference architecture, can see annual revenue jump from about $30 billion directly to $300 billion—a "tenfold growth" business story that excites the entire supply chain.

Token as Revenue: The Business Equation of AI Factories

Huang Renxun systematically dissects the business logic of "Token Economics" in his speech. He points out that AI inference has evolved from "answering questions" to "generating profit"—each Token output can directly correspond to the end customer’s willingness to pay. NVIDIA has designed five Token pricing models:

  • Free Tier: Basic Q&A, customer service
  • Lightweight (about $5 per million Tokens): Content generation, summarization
  • Professional (about $30 per million Tokens): Code generation, data analysis
  • Enterprise (about $80 per million Tokens): Regulatory compliance, financial modeling
  • Premium (about $150 per million Tokens): Scientific research, drug discovery, real-time inference

"Every Token can make money. AI companies will want to build more Tokens, generate more Tokens, produce more AI factories." Huang emphasizes that this is why Taiwan’s computing demand has already "rocket skyrocketed"—when computation directly equals revenue, expanding data centers becomes inevitable.

Vera Rubin Full Mass Production: Supply Chain Scale Doubles

As the most anticipated hardware release at GTC 2026, Vera Rubin architecture has officially entered full mass production. Huang revealed that Vera Rubin’s supply chain scale is twice that of the previous Grace Blackwell, with over 150 Taiwanese supply chain partners involved worldwide.

The flagship Vera Rubin NVL72 cabinet integrates 72 Rubin GPUs and 36 Vera CPUs, adopting 100% liquid cooling design, capable of deploying large-scale AI models within a single cabinet. Huang also publicly shared the next-generation Feynman architecture roadmap, which aims to further push the limits of inference performance and energy efficiency.

Notably, Huang hinted at "surprise new products" still unrevealed in the second half of the year, sparking high market expectations for consumer GPUs, automotive chips, and other new hardware.

Decoupled Inference: NVIDIA + Groq Create a "Token Dual Engine"

Huang specifically mentioned the collaborative strategy with Groq, a pioneer in LPU (Language Processing Units). Unlike GPUs, which excel at massive parallel computation, Groq’s 3 LPX chips, manufactured by Samsung and expected to ship in Q3, target scenarios with "minimum single-request latency"—in real-time inference tasks requiring millisecond responses, Groq LPUs outperform traditional GPUs significantly.

Huang used a simple formula to explain the business power of "decoupled inference":

  • Blackwell Generation: 1GW data center generates about $30 billion annually
  • Vera Rubin Generation: Under the same power consumption, annual revenue can reach $150 billion (5x)
  • Vera Rubin + Groq Decoupled Inference: annual revenue hits $300 billion (10x)

Data centers are shifting from model training to becoming factories that produce Tokens.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned