NVIDIA releases Gamma-World, a multi-agent world model supporting four-player collaboration and real-time 24 FPS

robot
Abstract generation in progress

ME AI News, according to Beating Monitoring Insight, NVIDIA, in collaboration with Tsinghua University, the University of Toronto, and the Vector Institute, has released a multi-agent generative world model called Gamma-World, breaking the long-standing limitation of virtual environment simulation being confined to single-player or dual-player interactions. The team has currently published a project page and a paper, with code and weights planned to be open-sourced soon.

The model introduces two mechanisms: a high-dimensional extension of rotational position encoding and information intermediary markers. While ensuring that multiple players can be controlled independently, for the first time it enables direct zero-shot transfer from dual-player to four-player collaboration without retraining. The primary challenge for multi-player world models lies in keeping each player independently controllable and ensuring their actions do not conflict.

To address this, the research team designed Simplex Rotary Agent Encoding, extending the classic Rotary Position Encoding (RoPE) into a high-dimensional angular space. This new encoding gives all players complete physical symmetry, eliminating reliance on fixed player IDs, and thereby enabling more natural independent referencing and control.

To prevent the computational load from increasing quadratically as the number of players grows, the approach introduces a Sparse Hub Attention mechanism. The system transmits interaction information through learnable center markers, successfully compressing the attention computation cost between players down to a linear level.

Regarding generation speed, the team distilled a high-latency diffusion model teacher into a causal model student, and, together with key-value caching (KV Cache), achieved real-time action response output at 24 frames per second (24 FPS). Evaluations in multiplayer gaming environments show that the new model is clearly superior to traditional slot-based and dense attention networks in terms of video frame realism, controllability of action responses, and consistency between players.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • 2
  • Share
Comment
Add a comment
Add a comment
GateUser-b74aba1c
· 4h ago
Sparse Hub Attention drops to linear cost—finally, no more staring at PPTs
View OriginalReply0
MemeFisher
· 5h ago
KV caching + teacher distillation, engineering optimization pushed to the max
View OriginalReply0
GlassDomeObservatory
· 5h ago
The controllability of the scene is better than traditional networks, and generative world models are set to become mainstream.
View OriginalReply0
LiquidityLibrarian
· 5h ago
After reading it, I want to reproduce it, but I found I'm missing a card.
View OriginalReply0
ArbiterOfFees
· 5h ago
NVIDIA is betting on AI-generated gaming worlds with this move.
View OriginalReply0
ProofOfVibes
· 5h ago
It is crucial for each player to independently control this point; many previous solutions could not achieve this.
View OriginalReply0
MarginMarmot
· 5h ago
Two people directly expand to four people, the new battleground of Scaling Law
View OriginalReply0
SentimentIndicatorHarvester
· 5h ago
Traditional Internet: Are you polite?
View OriginalReply0
  • Pinned