Huang Renxiun's Full GTC Speech: The Era of Inference Has Arrived, 2027 Revenue at Least 1 Trillion Dollars, Robotaxi is the New Operating System

On March 16, 2026, NVIDIA GTC 2026 officially opened, with Founder and CEO Jensen Huang delivering the keynote speech.

At this event, regarded as the “AI Industry’s Annual Pilgrimage,” Huang explained NVIDIA’s transformation from a “chip company” to an “AI infrastructure and factory company.” Addressing market concerns about sustained performance and growth potential, Huang detailed the underlying business logic driving future expansion—“Token Factory Economics.”

Revenue guidance is extremely optimistic: “At least $1 trillion demand from 2027 onward”

Over the past two years, global AI computing demand has exploded exponentially. As large models evolve from “perception” and “generation” to “reasoning” and “action (task execution),” the consumption of computing power has surged sharply. Confronted with market concerns about order and revenue ceilings, Huang provided very strong expectations.

In his speech, Huang openly stated:

A year ago, I mentioned we saw a high-confidence demand of $500 billion, covering Blackwell and Rubin until 2026. Now, right here, I see at least $1 trillion of demand by 2027.

Huang’s trillion-dollar forecast once drove NVIDIA’s stock price up over 4.3%.

Moreover, he added:

Is this reasonable? That’s what I’m about to discuss. In fact, we might even be undersupplied. I am certain that actual computing demand will be much higher than this.

Huang pointed out that NVIDIA’s current systems have proven themselves as the world’s “lowest-cost infrastructure.” Because NVIDIA can run nearly all AI models across various fields, this versatility allows customers’ $1 trillion investment to be fully utilized and maintain a long lifecycle.

Currently, 60% of NVIDIA’s business comes from the top five hyperscale cloud providers, while the remaining 40% is widely distributed across sovereign clouds, enterprises, industrial sectors, robotics, and edge computing.

Token Factory Economics: Performance per watt determines business lifeblood

To explain the reasonableness of this $1 trillion demand, Huang presented a new business mindset to global CEOs. He pointed out that future data centers will no longer be file storage warehouses but “Token factories”—production lines for AI-generated basic units.

Huang emphasized:

Every data center, every factory, by definition, is limited by power. A 1GW (gigawatt) factory will never become 2GW—that’s a physical and atomic law. Under fixed power, whoever has the highest throughput per watt for tokens will have the lowest production cost.

Huang divides future AI services into four business tiers:

  • Free Tier (high throughput, low speed)
  • Mid Tier (~$3 per million tokens)
  • High Tier (~$6 per million tokens)
  • Ultra-High Speed Tier (~$45 per million tokens)
  • Hyper-Speed Tier (~$150 per million tokens)

He pointed out that as models grow larger and contexts lengthen, AI becomes smarter, but token generation rate decreases. Huang stated:

In this Token Factory, your throughput and token generation speed will directly translate into your precise revenue next year.

Huang emphasized that NVIDIA’s architecture enables customers to achieve extremely high throughput at the free tier, while at the highest inference value tier, performance can be improved by an astonishing 35 times.


Vera Rubin achieves 350x acceleration in two years; Groq fills ultra-fast inference gap

Under physical limits, NVIDIA introduced its most complex AI computing system ever, Vera Rubin. Huang said:

When I mentioned Hopper, I would hold up a chip—that was cute. But when I mention Vera Rubin, everyone thinks of the entire system. In this fully liquid-cooled system, eliminating traditional cables, racks that took two days to install now take only two hours.

Huang pointed out that through extreme end-to-end hardware-software co-design, Vera Rubin has created astonishing data leaps within the same 1GW data center:

In just two years, we increased the token generation rate from 22 million to 700 million, a 350-fold increase. Moore’s Law during the same period only brought about a 1.5x improvement.

To address bandwidth bottlenecks under ultra-fast inference conditions (e.g., 1000 tokens/sec), NVIDIA presented the final solution integrating its acquired company Groq: asymmetric separation inference. Huang explained:

These two processors have very different characteristics. Groq chips have 500MB of SRAM, while a Rubin chip has 288GB of memory.


Huang noted that NVIDIA’s Dynamo software system disaggregates the inference pipeline: pre-fill (Pre-fill) and KV cache decoding are handled by Vera Rubin, while latency-sensitive decoding (decoding) is handled by Groq. He also offered enterprise compute configuration suggestions:

If your workload is mainly high throughput, use 100% Vera Rubin; if you have substantial high-value token generation needs, allocate about 25% of your data center to Groq.

It is revealed that Samsung-manufactured Groq LP30 chips are in mass production, with shipments starting in Q3, and the first Vera Rubin racks are already running on Microsoft Azure.

Additionally, Huang showcased the world’s first mass-produced co-packaged optical (CPO) switch Spectrum X, easing market concerns over “copper retreat, optical advance”:

We need more copper cable capacity, more optical chip capacity, and more CPO capacity.

Agent ends traditional SaaS; “Annual Token Budget + SaaS” becomes Silicon Valley’s new standard

Beyond hardware barriers, Huang dedicated much of his speech to the revolution in AI software and ecosystems, especially the explosion of Agents (intelligent entities).

He described open-source project OpenClaw as “the most popular open-source project in human history,” surpassing Linux’s achievements in just a few weeks. Huang directly stated that OpenClaw is essentially the “operating system” for agent computers.

Huang asserted:

Every SaaS (Software as a Service) company will become an AaaS (Agent-as-a-Service) company. Undoubtedly, to securely deploy such intelligent agents capable of accessing sensitive data and executing code, NVIDIA has launched enterprise-grade NeMo Claw reference designs, adding policy engines and privacy routers.

For ordinary workers, this transformation is also imminent. Huang depicted the future workplace:

In the future, every engineer in our company will have an annual token budget. Their base annual salary might be hundreds of thousands of dollars, and I will allocate about half of that amount as token quota, enabling them to achieve 10x efficiency. This has become a new recruiting leverage in Silicon Valley: How many tokens are in your offer?

He predicts that every company will be both a user (for engineers) and a producer (for clients) of tokens. The significance of OpenClaw is comparable to HTML or Linux.

NVIDIA’s open model initiative

In custom agent development, NVIDIA offers cutting-edge models:

  • Nemotron large language models
  • Cosmos foundational models (World Foundation Model)
  • GROOT general humanoid robot models
  • Alpamayo autonomous driving
  • BioNeMo digital biology
  • Phys-AI physics models

We are at the forefront in each field and committed to continuous iteration—Nemotron 4 after Nemotron 3, Cosmos 2 after Cosmos 1, and Groq’s second generation.

Nemotron 3 ranks among the top three global models in OpenClaw, at the forefront. Nemotron 3 Ultra will be the most powerful foundational model ever, supporting the development of sovereign AI by various countries.

Today, we announce the Nemotron Alliance, investing billions of dollars to advance AI foundational model R&D. Members include BlackForest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection, Sarvam (India), Thinking Machines (Mira Murati’s lab), and others. Many enterprise software companies are integrating NVIDIA’s AI tools and NeMo Claw reference designs into their products.

Physical AI and robotics

Digital agents act in the digital world—coding, data analysis; physical AI embodies intelligent agents—robots.

At this GTC, 110 robots appeared, nearly covering all global robot R&D companies. NVIDIA provides three computers (training, simulation, onboard) and a complete software stack with AI models.

In autonomous driving, the “ChatGPT moment” has arrived. Today, we announced four new partners joining NVIDIA’s RoboTaxi Ready platform: BYD, Hyundai, Nissan, Geely, with an annual output of 18 million vehicles. Alongside Mercedes-Benz, Toyota, and GM, the lineup is expanding. We also announced a major partnership with Uber to deploy and connect RoboTaxi Ready vehicles in multiple cities.

In industrial robotics, companies like ABB, Universal Robots, KUKA are collaborating with us to integrate physical AI models with simulation systems, promoting deployment on manufacturing lines worldwide.

In telecommunications, Caterpillar and T-Mobile are among the partners. Future wireless base stations will evolve from simple communication nodes into NVIDIA Aerial AI RAN—an intelligent edge computing platform capable of real-time traffic sensing, beamforming adjustment, and energy efficiency.

Special segment: Olaf robot debuts

(Play Disney Olaf robot demo video)

Huang: Snowman appears! Newton is running fine! Omniverse is working perfectly! Olaf, how are you?

Olaf: I’m so happy to see you.

Huang: Yes, because I gave you a computer—Jetson!

Olaf: What’s that?

Huang: Right inside your belly.

Olaf: That’s amazing.

Huang: You learned to walk in Omniverse.

Olaf: I like walking. It’s much better than riding a reindeer and looking at the beautiful sky.

Huang: That’s thanks to physics simulation—based on NVIDIA Warp’s Newton solver, developed jointly with Disney and DeepMind, enabling you to adapt to the real physical world.

Olaf: I was just about to say that.

Huang: That’s your smart part. I’m a snowman, not a snowball.

Huang: Can you imagine? Future Disney parks—where all these robot characters roam freely. Honestly, I thought you’d be taller. I’ve never seen such a short snowman.

Olaf: (shrugs)

Huang: Come help me finish today’s speech?

Olaf: Awesome!

Summary of the keynote

Huang: Today, we discussed the following core themes:

  1. The arrival of the reasoning inflection point: reasoning has become the most critical AI workload, tokens are the new commodities, and inference performance directly impacts revenue.

  2. The AI factory era: data centers have evolved from file storage facilities into token production factories; future competitiveness will be measured by “AI factory efficiency.”

  3. The Agent revolution: OpenClaw has ushered in the era of agent computing—enterprise IT is transitioning from tools to intelligent agents, and every company needs an OpenClaw strategy.

  4. Physical AI and robotics: embodied intelligence is scaling up, with autonomous driving, industrial robots, and humanoid robots forming the next major opportunities.

Thank you all, and enjoy GTC!

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin