Nvidia releases the most powerful open-source model Nemotron 3 Ultra! Focused on AI agent tasks, performance skyrockets 5 times, and costs are reduced by 30%

Chip giant NVIDIA announced today (the 4th) the launch of the new open-source flagship model "Nemotron 3 Ultra."
This model is specially designed for long-running AI agents (AI Agents) and complex multi-agent workflows, with a total of up to 550 billion parameters.
By introducing hybrid architectures and multiple technological innovations, Nemotron 3 Ultra not only performs excellently across various benchmarks but also delivers up to a 5x throughput increase and reduces agent task costs by 30%.
(Background recap: Nvidia spends $400 million to acquire Kumo AI! Completes the "enterprise prediction model" puzzle and accelerates the transformation into a full-stack AI giant)
(Additional background: Nvidia allies with Chinese startup Yushu Technology Unitree! Huang Renxun promotes "humanoid robot AI platform," targeting the multi-trillion-dollar physical AI market)

Table of Contents

Toggle

  • 55 billion active parameters, transforming into the "brain" of AI workflows
  • Five major technological innovations: 5x throughput boost, 30% cost reduction
  • Fully open source, accelerating enterprise AI applications

As artificial intelligence (AI) moves toward high automation and complex workflows, the computational costs and efficiency bottlenecks of "multi-agent systems" have become the biggest pain points for companies adopting AI.
To address this issue, NVIDIA officially released the flagship product of the Nemotron 3 family — Nemotron 3 Ultra — on June 4, 2026.

This is a powerful open-source model designed specifically for "long-running AI agents."
In traditional multi-agent workflows, due to the need for continuous re-planning, tool invocation, sub-agent delegation, and maintaining lengthy contexts, token consumption often skyrockets, leading to high costs and risk of goal deviation.
Nemotron 3 Ultra was created to overcome these challenges.

55 billion active parameters, transforming into the "brain" of AI workflows

Nemotron 3 Ultra adopts a Mixture-of-Experts (MoE) architecture, with a total of 550 billion parameters, but only activates 55 billion active parameters during each operation, ensuring maximum efficiency.
In multi-agent workflows, this model is precisely positioned as an "Orchestrator" or cutting-edge reasoning engine, handling high-load tasks such as deep planning, complex synthesis analysis, and logical verification, while routine execution and tool calls are managed by lightweight models.

In terms of performance, Nemotron 3 Ultra has excelled in multiple benchmarks focused on AI agents.
For example, it scored 91% in the agent productivity benchmark (PinchBench), and achieved 40% and 67% in long-term planning (EnterpriseOps-Gym) and code generation (Terminal-Bench 2.0), respectively.
Despite fewer active parameters, its overall reasoning ability has surpassed or matched mainstream open-source large models like GLM 5.1, Kimi K2.6, and Qwen3.5.

Five major technological innovations: 5x throughput boost, 30% cost reduction

To achieve such impressive performance and speed, NVIDIA incorporated five core technological innovations into Nemotron 3 Ultra.
First is the "Hybrid Mamba-Transformer layers," which cleverly combine Mamba's long-sequence processing efficiency with Transformer's precise fact recall capabilities.
Second is support for "NVFP4 quantization," allowing model weights to be seamlessly deployed on Hopper, Blackwell, and Ampere GPU architectures.
Compared to traditional BF16 format, on Blackwell, this can boost throughput by up to 5 times (output speed).

Additionally, the model integrates LatentMoE (an efficient expert routing designed for complex workloads), multi-token prediction (MTP, predicting multiple future tokens in a single forward pass to speed up long text generation), and multi-teacher online distillation (MOPD), among other cutting-edge techniques.
These innovations significantly reduce total token consumption during task processing, lowering enterprise agent task costs by up to 30%.

Fully open source, accelerating enterprise AI application deployment

In terms of training data, Nemotron 3 Ultra is based on a massive pre-training foundation of over 10 trillion tokens, supplemented with more than 212 billion domain-specific tokens (including legal documents, Wikipedia-style texts, and the latest GitHub code).
NVIDIA emphasizes that the model is fully open source, licensed under the highly flexible OpenMDW-1.1, and provides the community with complete model weights, training recipes, and data pipelines.

Currently, developers can access and deploy Nemotron 3 Ultra on mainstream platforms such as Hugging Face, NVIDIA Build, and NIM.
With its excellent long-text processing ability (achieving 95% in Ruler @1M testing) and high cost-effectiveness, this model is expected to become a powerful tool for enterprises to drive automation in customer service, supply chain management, IT security, and chip design verification.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned