Tencent open-sources the HunYuan World Model 2.0, a one-sentence prompt can generate an explorable 3D world, directly importable into Unity and Unreal Engine.

robot
Abstract generation in progress

ME News, April 16 (UTC+8). According to Beating Monitoring, Tencent has officially released and open-sourced its Hunyuan 3D world model 2.0 (HY-World 2.0).

This is a multimodal world model framework that supports text, single-image inputs, multi-view image inputs, and video inputs. The output is not video—instead, it produces editable 3D assets (mesh models, 3D Gaussian splats, and point clouds) that can be directly imported into Unity, Unreal Engine, and NVIDIA Isaac Sim.

Model weights and code are open-sourced on GitHub and Hugging Face.

The fundamental difference from video world models such as Genie 3 and Cosmos is that video world models generate pixel-level video that disappears after playback and cannot be edited; HY-World 2.0 generates persistent 3D assets that support free walking, physical collisions, and secondary editing. Tencent summarizes this difference in its technical report as “watch a video and it disappears” versus “build a world that is permanently preserved.”

With consumer-grade GPUs, it can be rendered in real time, and inference requires only one pass—unlike video world models, which must generate every frame repeatedly.

Technically, it is divided into four stages: first, use HY-Pano 2.0 to generate a 360-degree panorama from the input; then use WorldNav for trajectory planning; next, use WorldStereo 2.0 to expand the world along the trajectory; and finally, use WorldMirror 2.0 to reconstruct all generated segments into a unified 3D scene.

In the open-source setup, HY-World 2.0 claims to be the first 3D world model to reach SOTA level. Its results are comparable to closed-source commercial products like Marble. However, currently only the code and weights for WorldMirror 2.0 (the 3D reconstruction module, about 1.2 billion parameters) have been open-sourced. The code and weights for the three other modules—panoramic generation, trajectory planning, and world expansion—are labeled as “coming soon.”

For game developers, this means they can quickly generate level prototypes and maps with a single sentence, saving a large amount of manual modeling time. For embodied intelligence researchers, the cost of generating simulated training environments in bulk from photos is greatly reduced.

Tencent has also launched an online experience entry, allowing users to control characters to freely explore the generated streets and buildings.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin