NVIDIA Cosmos 3 Physical AI World Model officially opens for download today, with two versions available on Hugging Face

CoinWorld News: NVIDIA today officially opened the download of the model weights for Cosmos 3, its physical AI world model. The first batch released includes two versions—super (646亿 parameters) and nano (157亿 parameters). Both are now available on HuggingFace (no gating; can be downloaded directly) and build.nvidia.com, and they also support deployment as NVIDIA NIM microservices.

Cosmos 3 is positioned as a fully multimodal (omnimodel) world foundation model for physical AI, based on a brand-new hybrid Transformer architecture (mixture of transformers), with native understanding and generation of text, images, videos, environmental sounds, and actions.

The super version is aimed at post-training robotics and autonomous driving models that require the highest physical accuracy, while the nano version is designed for low-latency scenarios involving high-quality video and action reasoning. In addition, an edge version (for real-time inference on edge devices) is expected to be released soon.

NVIDIA says Cosmos 3 is the “world’s first fully open multimodal model.” Developers can freely download it, post-train it, and convert it into proprietary models.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • 2
  • Share
Comment
Add a comment
Add a comment
OwlChainViewer
· 3h ago
The hybrid Transformer architecture has some substance; a unified understanding of all modalities is finally not just empty talk.
View OriginalReply0
Stop-LossInTheEveningGlow
· 3h ago
The wait-and-see camp's great victory, the rumored parameter scale before is actually true.
View OriginalReply0
SlowerThanBlock
· 4h ago
Physical AI world model + no gating, will other major companies follow? If not, they'll fall behind.
View OriginalReply0
GateUser-470bc925
· 4h ago
Parameters differ by a factor of four; the trade-off between accuracy and efficiency depends on the scenario.
View OriginalReply0
NeonFusionIceCream
· 4h ago
How exactly is the mixture of transformers combined? Waiting for a technical blog breakdown.
View OriginalReply0
QuantsAndCats
· 4h ago
For post-training autonomous driving, use Super; for video generation, use Nano; roles are clearly defined.
View OriginalReply0
L2ArbitrageYoungster
· 4h ago
Many people haven’t noticed that native support for ambient sound is here—multimodal finally has ears now.
View OriginalReply0
ArbitrageIsn'tAsGoodAsGetting
· 4h ago
NIM microservice deployment is very friendly for small and medium-sized enterprises, no need to set up complex infrastructure themselves
View OriginalReply0
  • Pinned