Understanding Jensen Huang's speech at the NVIDIA GTC conference in one article: firmly believe that Computing Power never sleeps

2025-03-19 00:25:20

Authors: Su Yang, Hao Boyang; Source: Tencent Technology

As the 'seller' of the AI era, Huang Renxun and his NVIDIA always believe that computing power never sleeps.

Huang Renxun claimed in the GTC speech that reasoning caused a 100-fold increase in computing power demand.

Today at the GTC conference, Huang Renxun unveiled the brand new Blackwell Ultra GPU, as well as the server SKU for inference and Agent derived from it, including the RTX series based on the Blackwell architecture. All of this is related to computing power, but what's more important next is how to continuously and reasonably consume the computing power.

In Huang Renxun's view, access to AGI requires computing power, embodied intelligent robots require computing power, and building Omniverse and world models require a continuous stream of computing power. As for how much computing power is needed for humans to ultimately construct a virtual "parallel universe," Nvidia has provided an answer - 100 times more than in the past.

To support his point, Huang Renxun presented a set of data at the GTC event - in 2024, the top four cloud factories in the United States purchased a total of 1.3 million Hopper architecture chips, and by 2025, this figure soared to 3.6 million Blackwell GPUs.

Here are some key points from the 2025 NVIDIA GTC conference compiled by Tencent Technology:

Blackwell Family Bucket Online

1) Annual "Nuclear Bomb" Blackwell Ultra in squeezing toothpaste

NVIDIA released the Blackwell architecture at last year's GTC and launched the GB200 chip. This year, the official name has been slightly adjusted. It is not called the previously rumored GB300 but directly referred to as Blackwell Ultra.

But from a hardware perspective, it is just replacing the new HBM memory on the basis of last year. In short, Blackwell Ultra = Blackwell large memory version.

Blackwell Ultra is powered by two TSMC N4P (5nm) process, Blackwell architecture chips + Grace CPU package, and is equipped with a more advanced 12-layer stacked HBM3e memory, with a memory capacity increased to 288GB, and supports the fifth generation NVLink, achieving an inter-chip interconnect bandwidth of 1.8TB/s, just like the previous generation.

NVLink performance parameters over the years

Based on the upgrade of storage, the FP4 precision computing power of Blackwell GPU can reach 15PetaFLOPS, and the inference speed based on the Attention Acceleration mechanism is 2.5 times faster than the Hopper architecture chip.

2) Blackwell Ultra NVL72: AI inference dedicated server cabinet

Blackwell Ultra NVL72 Official Image

Like the GB200 NVL72, NVIDIA also launched a similar product this year - the Blackwell Ultra NVL72 server cabinet, consisting of a total of 18 compute trays. Each compute tray contains 4 Blackwell Ultra GPUs + 2 Grace CPUs, totaling 72 Blackwell Ultra GPUs + 36 Grace CPUs. The memory reaches 20TB, total bandwidth is 576TB/s, plus 9 NVLink switch trays (18 NVLink switch chips), with inter-node NVLink bandwidth of 130TB/s.

The cabinet is equipped with 72 CX-8 network cards, providing a bandwidth of 14.4TB/s. Quantum-X800 InfiniBand and Spectrum-X 800G Ethernet cards can reduce latency and jitter, supporting large-scale AI clusters. In addition, the rack also integrates 18 BlueField-3 DPUs for enhanced multi-tenant networking, security, and data acceleration.

NVIDIA said that this product is custom-made for the 'AI inference era', with applications including inference-based AI, Agents, and physical AI ( used for robot and smart driving training data simulation synthesis ). Compared to the previous generation product GB200 NVL72, the AI performance has increased by 1.5 times, and compared to the DGX cabinet product with the same positioning as the Hopper architecture, it can provide data centers with a 50-fold increase in revenue opportunities.

Based on official information, the inference of the 671 billion parameter DeepSeek-R1 can achieve 100 tokens per second based on the H100 product, while using the Blackwell Ultra NVL72 solution, it can reach 1000 tokens per second.

In terms of time, the same reasoning task, H100 takes 1.5 minutes to run, while Blackwell Ultra NVL72 can be completed in 15 seconds.

Hardware parameters of Blackwell Ultra NVL72 and GB200 NVL72

According to the information provided by NVIDIA, the Blackwell NVL72 related products are expected to be launched in the second half of 2025, with customers including server manufacturers, cloud factories, and computing power leasing service providers:

Server Vendor

Cisco/Dell/HPE/Lenovo/Supermicro and other 15 manufacturers

Cloud Factory

AWS/Google Cloud/Azure/Oracle and other mainstream platforms

Hashrate Leasing Service Provider

CoreWeave/Lambda/Yotta etc.

3) Advance notice of the real "nuclear bomb" GPU Rubin chip

According to NVIDIA's roadmap, the home of GTC2025 is Blackwell Ultra.

However, Huang Renxun also took this opportunity to preview the next generation GPU based on the Rubin architecture scheduled to be launched in 2026, as well as the more powerful cabinet Vera Rubin NVL144 - 72 Vera CPUs + 144 Rubin GPUs, using 288GB HBM4 chips with a memory bandwidth of 13TB/s, paired with the sixth generation NVLink and CX9 network card.

How strong is this product? The inference computing power of FP4 precision reaches 3.6ExaFLOPS, and the training computing power of FP8 precision also reaches 1.2ExaFlOPS, which is 3.3 times the performance of Blackwell Ultra NVL72.

If you feel it's not enough, it's okay. In 2027, there will be a stronger Rubin Ultra NVL576 cabinet, with FP4 precision inference and FP8 precision training power of 15ExaFLOPS and 5ExaFLOPS, respectively, 14 times that of Blackwell Ultra NVL72.

NVL144 and NVL576 parameters provided by NVIDIA

4) Blackwell Ultra version of DGX Super POD "supercomputer factory"

For those customers who are not satisfied with the current Blackwell Ultra NVL72 and do not need to build a super large-scale AI cluster, NVIDIA's solution is based on Blackwell Ultra, the plug-and-play DGX Super POD AI supercomputing factory.

As a plug-and-play AI supercomputing factory, DGX Super POD is mainly aimed at AI scenarios such as generative AI, AI Agent, and physical simulation, covering the full-cycle computing power expansion needs from pre-training, post-training to production environments, with Equinix as the first service provider to provide liquid-cooled/air-cooled infrastructure support.

DGX SuperPod built by Blackwell Ultra

The DGX Super POD based on the customized Blackwell Ultra is divided into two versions:

DGX SuperPOD with built-in DGX GB300 (Grace CPU ×1 + Blackwell Ultra GPU ×2), a total of 288 Grace CPUs + 576 Blackwell Ultra GPUs, providing 300TB of fast memory, computing power at FP4 precision is 11.5ExaFLOPS
DGX SuperPOD with built-in DGX B300, this version does not include Grace CPU chip, has further expansion space, and adopts air cooling system, mainly used in ordinary enterprise data centers

5) DGX Spark and DGX Station

In January of this year, NVIDIA showcased a conceptual AI PC product priced at $3000 at CES — Project DIGITS, now it has an official name DGX Spark.

In terms of product parameters, it is equipped with the GB10 chip, with a FP4 precision computing power of 1PetaFlops, built-in 128GB LPDDR5X memory, CX-7 network card, 4TB NVMe storage, running the DGX OS operating system based on Linux, supporting frameworks such as Pytorch, and pre-installed with some basic AI software development tools provided by NVIDIA, capable of running models with 200 billion parameters. The size of the whole machine is close to that of the Mac mini. Two DGX Sparks can be interconnected, and models with over 400 billion parameters can be run.

Although we call it an AI PC, it essentially still belongs to the supercomputing category, so it is placed in the DGX product line instead of consumer-level products like RTX.

However, some people also criticized this product, saying that the promotional performance and usability of FP4 are low. When converted to FP16 precision, it can only compete with RTX 5070 or even the $250 Arc B580, making it extremely low in cost-effectiveness.

DGX Spark computer and DGX Station workstation

In addition to the officially named DGX Spark, NVIDIA also launched an AI workstation based on Blackwell Ultra, which is equipped with a Grace CPU and a Blackwell Ultra GPU, paired with 784GB of unified memory, CX-8 network card, providing 20 PetaFlops of AI computing power (officially unmarked, theoretically also FP4 precision).

6) RTX sweeps AI PC, and even squeezes into data centers

The products SKU based on Grace CPU and Blackwell Ultra GPU introduced earlier are all enterprise-level products. Considering the ingenious use of products like RTX 4090 in AI inference, NVIDIA has further strengthened the integration of Blackwell and RTX series in this GTC, launching a wave of AI PC-related GPUs with built-in GDDR7 memory, covering scenarios such as laptops, desktops, and even data centers.

Desktop GPU: including RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q Workstation Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, and RTX PRO 4000 Blackwell

Laptop GPUs: RTX PRO 5000 Blackwell, RTX PRO 4000 Blackwell, RTX, PRO 3000 Blackwell, RTX PRO 2000 Blackwell, RTX PRO 1000 Blackwell, and RTX PRO 500 Blackwell
Data Center GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition

NVIDIA's AI "toolbox" tailored for enterprise-level computing

The above is just a partial list of SKUs customized for different scenarios based on the Blackwell Ultra chip, from workstations to data center clusters. NVIDIA calls it the "Blackwell Family," which can be translated into Chinese as "Blackwell Full Family Bucket."

NVIDIA Photonics: CPO system standing on the shoulders of teammates

The concept of Co-Packaging Optics (CPO) is simply to package switch chips and optical modules together, which can convert optical signals into electrical signals, making full use of the transmission performance of optical signals.

Before this, the industry has been discussing NVIDIA's CPO network switch products, but they have not been launched yet. Huang Renxun also explained on the spot that due to the extensive use of fiber connections in data centers, the power consumption of optical networks accounts for 10% of computing resources. The cost of optical connections directly affects the scale-out network of computing nodes and the improvement of AI performance density.

The parameters of the two silicon optical co-packaged chips Quantum-X and Spectrum-X displayed on GTC

This year, GTC NVIDIA launched Quantum-X silicon photonics packaged chip, Spectrum-X silicon photonics packaged chip, and three derivative switch products: Quantum 3450-LD, Spectrum SN6810, and Spectrum SN6800.

Quantum 3450-LD: 144 800GB/s ports, backplane bandwidth 115TB/s, liquid cooling
Spectrum SN6810: 128 800GB/s ports, backplane bandwidth 102.4TB/s, liquid cooling
Spectrum SN6800: 512 800GB/s ports, backplane bandwidth of 409.6TB/s, liquid cooling

The above products are all classified under 'NVIDIA Photonics'. NVIDIA says this is a platform co-created with CPO partners, for example, the micro-ring modulator (MRM) it carries is optimized based on TSMC's optical engine, supporting high-power, high-efficiency laser modulation, and adopting detachable optical fiber connectors.

Interestingly, according to previous industry information, TSMC's Micro Ring Modulator (MRM) is built on its 3nm process and advanced packaging technology such as CoWoS, in collaboration with Broadcom.

According to the data provided by NVIDIA, the performance of the Photonics switch integrating optical modules is 3.5 times higher than that of traditional switches, and the deployment efficiency can also be increased by 1.3 times, as well as more than 10 times of expansion elasticity.

Model Efficiency PK DeepSeek: Software Ecology Empowers AI Agent

Huang Renxun described the 'big pancake' of AI infra on the scene

Because this 2-hour long GTC, Huang Renxun only talked about software and embodied intelligence for about half an hour. Therefore, many details are supplemented by official documents rather than entirely from the scene.

1) Nvidia Dynamo, a new CUDA built by Nvidia in the field of inference

Nvidia Dynamo is definitely the software bombshell released in this event.

It is an open-source software designed specifically for reasoning, training, and accelerating across the entire data center. The performance data of Dynamo is quite impressive: on the existing Hopper architecture, Dynamo can double the performance of the standard Llama model. For specialized inference models like DeepSeek, NVIDIA Dynamo's intelligent inference optimization can increase the number of tokens generated by each GPU by more than 30 times.

Huang Renxun demonstrated that Blackwell with Dynamo can outperform Hopper by more than 25 times

These improvements to Dynamo are primarily due to distribution. It distributes the different computational stages of LLM (understanding user queries and generating the best response) to different GPUs, allowing each stage to be independently optimized, increasing throughput and speeding up response times.

System architecture of Dynamo

For example, in the input processing stage, that is, the prefilling stage, Dynamo can efficiently allocate GPU resources to process user input. The system will use multiple groups of GPUs to parallel process user queries, hoping that the GPUs can process more dispersed and faster. Dynamo uses the FP4 mode to simultaneously and in parallel 'read' and 'understand' the user's questions with multiple GPUs, with one group of GPUs processing the background knowledge of 'World War II', another group processing historical materials related to 'cause', and the third group processing the timeline and events of 'process', this stage is like multiple research assistants simultaneously consulting a large amount of materials.

In the generation of output tokens, which is the decoding stage, it is necessary to make the GPU more focused and coherent. Compared to the number of GPUs, this stage requires a larger bandwidth to absorb the thinking information from the previous stage, thus requiring more cache reads. Dynamo optimizes GPU communication and resource allocation to ensure coherent and efficient response generation. On the one hand, it fully utilizes the high-bandwidth NVLink communication capability of the NVL72 architecture to maximize token generation efficiency. On the other hand, by using the 'Smart Router' to direct requests to GPUs that have cached relevant KV( key values, it can avoid redundant calculations, greatly improving processing speed. By avoiding redundant calculations, some GPU resources are released, and Dynamo can dynamically allocate these idle resources to new incoming requests.

This architecture is very similar to Kimi's Mooncake architecture, but NVIDIA has done more support at the underlying infra. Mooncake can probably increase by about 5 times, but Dynamo's improvement in reasoning is more significant.

For example, in several important innovations of Dynamo, the 'GPU Planner' can dynamically adjust GPU allocation according to the load, the 'Low Latency Communication Library' optimizes data transfer between GPUs, and the 'Memory Manager' intelligently moves inference data between storage devices at different cost levels, further reducing operating costs. The intelligent router, LLM-aware routing system, directs requests to the most suitable GPU, reducing redundant calculations. This series of capabilities optimize the load of the GPU.

Using this set of software reasoning system can efficiently scale to large GPU clusters, allowing a single AI query to seamlessly scale to as many as 1000 GPUs, to fully utilize data center resources.

For GPU operators, this improvement significantly reduces the cost per million tokens and greatly increases production capacity. At the same time, each user gets more tokens per second, responds faster, and improves user experience.

![image])https://img.gateio.im/social/moments-f8de7caca1c8cdd058d3b932d7d378a1(

With Dynamo, achieve the golden balance between throughput and response speed of servers

Unlike CUDA as the underlying foundation for GPU programming, Dynamo is a higher-level system focusing on intelligent allocation and management of large-scale inference workloads. It is responsible for the distributed scheduling layer of inference optimization, located between applications and underlying computing infrastructure. However, just as CUDA completely changed the landscape of GPU computing more than a decade ago, Dynamo may also successfully pioneer a new paradigm for inference software and hardware efficiency.

Dynamo is completely open source, supporting all mainstream frameworks from PyTorch to Tensor RT. Open sourcing is still a moat. Like CUDA, it is only effective for NVIDIA's GPU and is part of the NVIDIA AI inference software stack.

By using this software upgrade, NVIDIA has built its own defense against specialized inference AISC chips like Groq. It is necessary to match software and hardware in order to dominate the inference infrastructure.

) 2) The new model of Llama Nemotron is efficient, but still can't beat DeepSeek.

Although Dynamo is quite amazing in terms of server utilization, there is still a bit of a gap between NVIDIA and the real experts in training models.

NVIDIA unveiled a new model, Llama Nemotron, at this GTC, emphasizing efficiency and accuracy. It is derived from the Llama series models. After NVIDIA's special fine-tuning, compared to the original Llama, this model has been algorithmically pruned and optimized to be more lightweight, at only 48B. It also has inference capabilities similar to o1. Like Claude 3.7 and Grok 3, the Llama Nemotron model has a built-in inference capability switch that users can choose to enable. This series is divided into three levels: entry-level Nano, mid-range Super, and flagship Ultra, each tailored to different scales of enterprise needs.

![image]###https://img.gateio.im/social/moments-f96380931cf2a144170345b7ec105846(

Specific data of Llama Nemotron

When it comes to efficiency, the fine-tuning dataset of this model is entirely composed of synthetic data generated by NVIDIA itself, with a total of about 60B tokens. Compared to DeepSeek V3, which took 1.3 million H100 hours for full training, this model with only 1/15 of the parameters of DeepSeek V3 took only 360,000 H100 hours for fine-tuning. The training efficiency is one level lower than DeepSeek.

In terms of efficiency in reasoning, the Llama Nemotron Super 49B model does perform much better than the previous generation model. Its token throughput can reach 5 times that of the Llama 3 70B, achieving over 3000 tokens per second on a single data center GPU. However, in the data released on the last day of DeepSeek's open source day, the average throughput of each H800 node during prefilling is about 73.7k tokens/s input (including cache hits) or about 14.8k tokens/s output during decoding. The difference between the two is still quite significant.

![图片])https://img.gateio.im/social/moments-8378715743f1e60d041a3cd7d7c219de(

In terms of performance, the 49B Llama Nemotron Super outperforms the 70B Llama 70B model distilled by DeepSeek R1 in all indicators. However, considering the frequent release of small parameter high-energy models such as Qwen QwQ 32B model recently, Llama Nemotron Super is estimated to be less outstanding among these models that can compete with R1.

Most importantly, this model is equivalent to a real hammer, proving that DeepSeek may understand how to train GPU better than Nvidia.

) 3) The new model is just an appetizer for the NVIDIA AI Agent ecosystem, NVIDA AIQ is the main course.

Why does NVIDIA develop an inference model? This is mainly to prepare for the next AI breakthrough, AI Agent, which old Huang is interested in. Since big companies like OpenAI and Claude have gradually established the foundation of Agent through DeepReasearch and MCP, NVIDIA clearly also believes that the era of Agent has arrived.

The NVIDA AIQ project is NVIDIA's attempt. It directly provides a ready-made workflow for an AI agent planner with the Llama Nemotron inference model as its core. This project belongs to NVIDIA's Blueprint level, which refers to a pre-configured reference workflow, a set of template templates that help developers more easily integrate NVIDIA's technologies and libraries. And AIQ is the agent template provided by NVIDIA.

![图片]###https://img.gateio.im/social/moments-97ea77b03ad4a4fe4b1b1fede25596a4(

NVIDIA AIQ architecture

Like Manus, it integrates external tools such as web search engines and other professional AI agents, allowing the Agent itself to both search and use various tools. Through the planning of the Llama Nemotron reasoning model, it reflects and optimizes processing solutions to complete user tasks. In addition, it also supports the construction of multi-Agent workflow architecture.

![image])https://img.gateio.im/social/moments-075dfe522598ad052ab2907c048fb015(

ServiceNow system based on this template

Beyond Manus, it features a sophisticated RAG system for enterprise documents. This system includes a series of steps such as extraction, embedding, vector storage, reordering, and ultimately processing by LLM, ensuring that enterprise data is used by the Agent.

Above all, NVIDIA has also launched an AI data platform that connects AI inference models to enterprise data systems, forming a DeepReasearch tailored to enterprise data. This represents a significant evolution in storage technology, transforming storage systems from mere data warehouses into intelligent platforms with active inference and analysis capabilities.

![Image])https://img.gateio.im/social/moments-a39ee6ec030b38226c1811e5d14a2348(

Composition of AI Data Platform

In addition, AIQ emphasizes observability and transparency mechanisms. This is very important for security and subsequent improvements. The development team can monitor the activities of the Agent in real time and continuously optimize the system based on performance data.

Overall, NVIDA AIQ is a standard Agent workflow template that provides various Agent capabilities. It is considered a more user-friendly Dify type Agent construction software that has evolved into the inference era.

Basic model of humanoid robot released, NVIDIA aims to create a complete closed-loop embodied ecosystem

) 1) Cosmos, enabling self-intelligent understanding of the world

If focusing on Agent or betting now, NVIDIA's layout in embodied intelligence can be considered as integrating the future entirely.

NVIDIA has arranged all three elements of models, data, and computing power.

Starting with the model, this GTC released an upgraded version of the embodied intelligence basic model Cosmos announced in January this year.

Cosmos is a model that can predict future images through current images. It can take text/image input data, generate detailed videos, and predict the evolution of the scene by combining its current state (image/video) with actions (hints/control signals). Because this requires an understanding of the physical causality of the world, NVIDIA calls Cosmos the World Foundation Model (WFM).

![image]###https://img.gateio.im/social/moments-96eed5a18a4c78811de012d7353fe71d(

The basic architecture of Cosmos

And for embodied intelligence, the ability to predict the impact of machine behavior on the external world is the most fundamental. Only in this way can the model plan behavior based on predictions, so the world model becomes the basic model of embodied intelligence. With this basic world prediction model of behavior/time-physical world changes, through specific data set fine-tuning such as automatic driving, robot tasks, this model can meet the practical landing needs of various physically intelligent entities.

The entire model consists of three parts of capabilities, the first part being Cosmos Transfer which converts structured video text input into controllable realistic video output, generating large-scale synthetic data out of thin air using text. This solves the biggest bottleneck of embodied intelligence at present—the problem of insufficient data. Moreover, this generation is a 'controllable' generation, which means that users can specify specific parameters (such as weather conditions, object properties, etc.), and the model will adjust the generated results accordingly, making the data generation process more controllable and targeted. The entire process can also be combined by Ominiverse and Cosmos.

![image])https://img.gateio.im/social/moments-e6b5268dffddd018830e53f9ae2c2515(

Cosmos is a reality simulation built on Ominiverse.

Part II Cosmos Predict can generate virtual world states from multimodal inputs, supporting multi-frame generation and action trajectory prediction. This means that, given the initial and final states, the model can generate a reasonable intermediate process. This is the core cognitive and constructive ability in the physical world.

The third part is Cosmos Reason, which is an open and fully customizable model with spatiotemporal perception capabilities. It understands video data through chain reasoning and predicts interactive results. This is the ability to plan and predict behaviors.

With the gradual accumulation of these three capabilities, Cosmos can achieve the complete chain of behaviors from real image token + text command prompt token input to machine action token output.

This basic model should indeed be quite effective. Just two months after its launch, the three leading companies 1X, Agility Robotics, and Figure AI have all started using it. While the big language model is not leading, embodied intelligence Nvidia is indeed in the first tier.

) 2) Isaac GR00T N1, the world's first humanoid robot basic model

With Cosmos in place, Nvidia naturally fine-tuned the basic model Isaac GR00T N1 dedicated to humanoid robots using this framework.

![图片]###https://img.gateio.im/social/moments-03f9b90d7d4337d4b49542337c32cccf(

Isaac GR00T N1's dual system architecture

It adopts a dual-system architecture, with a fast-reacting "System 1" and a deep-thinking "System 2". Its comprehensive fine-tuning enables it to handle general tasks such as grasping, moving, and dual-arm operations. Moreover, it can be fully customized according to specific robots, and robot developers can use real or synthetic data for further training. This means that this model can actually be deployed in a wide variety of robots with different shapes.

For example, NVIDIA collaborated with Google DeepMind and Disney to develop the Newton physics engine, using Isaac GR00T N1 as the base to drive a very rare small Disney BDX robot. Its versatility is evident. Newton, as a physics engine, is very delicate, so it is sufficient to establish a physical reward system to train embodied intelligence in a virtual environment.

![图片])https://img.gateio.im/social/moments-cb2f7f01e71700f7175d3a81f75d38b9(

Huang Renxun and BDX robot interact "passionately" on stage

) 4) Data generation, tackling on all fronts

NVIDIA, combined with NVIDIA Omniverse and the NVIDIA Cosmos Transfer world base model mentioned above, has created the Isaac GR00T Blueprint. It can generate a large amount of synthetic motion data from a small number of human demonstrations for robot operation training. NVIDIA used the initial components of the Blueprint to generate 780,000 synthetic trajectories in just 11 hours, equivalent to 6,500 hours (about 9 months) of human demonstration data. A significant portion of the data for Isaac GR00T N1 comes from this, and this data has improved the performance of GR00T N1 by 40% compared to using only real data.

![image]###https://img.gateio.im/social/moments-4a7651bbdc8a83e0c1d4c39e114d730a(

Twin Simulation System

For each model, with the Omniverse virtual system and the Cosmos Transfer real-world image generation system, NVIDIA can provide a large amount of high-quality data. The second aspect of this model is also covered by NVIDIA.

) 3）Trinity computing power system, building a robot computing empire from training to end

Since last year, Lao Huang has been emphasizing the concept of 'three computers' on GTC: one is DGX, a large GPU server used for training AI, including embodied intelligence. The other is AGX, an embedded computing platform designed by NVIDIA for edge computing and autonomous systems, used to deploy AI specifically on the edge, such as the core chip for autonomous driving or robots. The third is the data generation computer Omniverse+Cosmos.

![图片]###https://img.gateio.im/social/moments-7dba53ee823059c29b6b23fb6e0a86f5(

Three major computational systems with embodied intelligence

This system was again emphasized by Lao Huang in this GTC, especially mentioning that relying on this computing power system, billion-level robots can be born. From training to deployment, all the computing power is provided by NVIDIA. This part is also closed loop.

Conclusion

Compared solely to the previous generation Blackwell chip, Blackwell Ultra does not quite match up in terms of hardware to the previous 'nuclear bomb' or 'big bang' descriptions, and even has a bit of a toothpaste-squeezing taste.

However, from the perspective of the roadmap planning, all these are within Huang Renxun's layout. Next year and the year after, the Rubin architecture will see a significant increase in chip technology, transistors, integration of racks, GPU interconnection, cabinet interconnection, and other specifications. As the Chinese saying goes, "the best is yet to come."

In contrast to the hardware-level piecemeal satiation, NVIDIA has been advancing by leaps and bounds in software over the past two years.

Throughout NVIDIA's entire software ecosystem, services at three levels of Meno, Nim, and Blueprint include model optimization, model encapsulation, and full-stack solutions for application construction. The ecosystem of cloud service companies completely overlaps with NVIDIA AI. With the addition of this new Agent, NVIDIA needs to incorporate all parts of the AI infrastructure pie, except for the basic model.

In terms of software, Lao Huang's appetite is as big as NVIDIA's stock price.

In the robot market, NVIDIA has even greater ambitions. They control the three key elements: models, data, and computing power. Although they missed the opportunity to dominate the basic language models, they have filled the gap with embodied intelligence. Faintly discernible, a monopolistic giant in the embodied intelligence version has already appeared on the horizon.

In this process, each link, each product corresponds to a potential market of hundreds of billions. In the early years, the lucky gambling king, Huang Renxun, who staked everything, started a bigger gamble with the money he monopolized from GPUs.

If in this gamble, either the software or the robotics market dominates, then Nvidia will be the Google of the AI era, the top monopolist in the food chain.

But looking at the profit margin of NVIDIA GPU, we still look forward to such a future.

Fortunately, for Lao Huang, this is also the biggest gamble he has never managed in his life, and the outcome is unpredictable.

GTC-0.25%

UOS0.61%

View Original

The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.