Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Nvidia releases the most powerful open-source model Nemotron 3 Ultra! Focused on AI agent tasks, performance skyrockets 5 times, and costs are reduced by 30%
Chip giant NVIDIA announced today (the 4th) the launch of the new open-source flagship model "Nemotron 3 Ultra."
This model is specially designed for long-running AI agents (AI Agents) and complex multi-agent workflows, with a total of up to 550 billion parameters.
By introducing hybrid architectures and multiple technological innovations, Nemotron 3 Ultra not only performs excellently across various benchmarks but also delivers up to a 5x throughput increase and reduces agent task costs by 30%.
(Background recap: Nvidia spends $400 million to acquire Kumo AI! Completes the "enterprise prediction model" puzzle and accelerates the transformation into a full-stack AI giant)
(Additional background: Nvidia allies with Chinese startup Yushu Technology Unitree! Huang Renxun promotes "humanoid robot AI platform," targeting the multi-trillion-dollar physical AI market)
Table of Contents
Toggle
As artificial intelligence (AI) moves toward high automation and complex workflows, the computational costs and efficiency bottlenecks of "multi-agent systems" have become the biggest pain points for companies adopting AI.
To address this issue, NVIDIA officially released the flagship product of the Nemotron 3 family — Nemotron 3 Ultra — on June 4, 2026.
This is a powerful open-source model designed specifically for "long-running AI agents."
In traditional multi-agent workflows, due to the need for continuous re-planning, tool invocation, sub-agent delegation, and maintaining lengthy contexts, token consumption often skyrockets, leading to high costs and risk of goal deviation.
Nemotron 3 Ultra was created to overcome these challenges.
55 billion active parameters, transforming into the "brain" of AI workflows
Nemotron 3 Ultra adopts a Mixture-of-Experts (MoE) architecture, with a total of 550 billion parameters, but only activates 55 billion active parameters during each operation, ensuring maximum efficiency.
In multi-agent workflows, this model is precisely positioned as an "Orchestrator" or cutting-edge reasoning engine, handling high-load tasks such as deep planning, complex synthesis analysis, and logical verification, while routine execution and tool calls are managed by lightweight models.
In terms of performance, Nemotron 3 Ultra has excelled in multiple benchmarks focused on AI agents.
For example, it scored 91% in the agent productivity benchmark (PinchBench), and achieved 40% and 67% in long-term planning (EnterpriseOps-Gym) and code generation (Terminal-Bench 2.0), respectively.
Despite fewer active parameters, its overall reasoning ability has surpassed or matched mainstream open-source large models like GLM 5.1, Kimi K2.6, and Qwen3.5.
Five major technological innovations: 5x throughput boost, 30% cost reduction
To achieve such impressive performance and speed, NVIDIA incorporated five core technological innovations into Nemotron 3 Ultra.
First is the "Hybrid Mamba-Transformer layers," which cleverly combine Mamba's long-sequence processing efficiency with Transformer's precise fact recall capabilities.
Second is support for "NVFP4 quantization," allowing model weights to be seamlessly deployed on Hopper, Blackwell, and Ampere GPU architectures.
Compared to traditional BF16 format, on Blackwell, this can boost throughput by up to 5 times (output speed).
Additionally, the model integrates LatentMoE (an efficient expert routing designed for complex workloads), multi-token prediction (MTP, predicting multiple future tokens in a single forward pass to speed up long text generation), and multi-teacher online distillation (MOPD), among other cutting-edge techniques.
These innovations significantly reduce total token consumption during task processing, lowering enterprise agent task costs by up to 30%.
Fully open source, accelerating enterprise AI application deployment
In terms of training data, Nemotron 3 Ultra is based on a massive pre-training foundation of over 10 trillion tokens, supplemented with more than 212 billion domain-specific tokens (including legal documents, Wikipedia-style texts, and the latest GitHub code).
NVIDIA emphasizes that the model is fully open source, licensed under the highly flexible OpenMDW-1.1, and provides the community with complete model weights, training recipes, and data pipelines.
Currently, developers can access and deploy Nemotron 3 Ultra on mainstream platforms such as Hugging Face, NVIDIA Build, and NIM.
With its excellent long-text processing ability (achieving 95% in Ruler @1M testing) and high cost-effectiveness, this model is expected to become a powerful tool for enterprises to drive automation in customer service, supply chain management, IT security, and chip design verification.