Microsoft World-R1: Teaching video models to "understand" 3D with reinforcement learning, no architecture changes, PSNR increases by 10dB

robot
Abstract generation in progress

AIMPACT News, April 28 (UTC+8): According to Beating monitoring, a team from Microsoft Research and Zhejiang University proposed World-R1. It uses reinforcement learning to help text-to-video models learn 3D geometric consistency, without modifying the model architecture or relying on 3D datasets.

Core idea: after generating a video, use a pre-trained 3D foundational model, Depth Anything 3, to reconstruct the scene’s 3D Gaussians (3DGS), then render from a new viewpoint and compare with the original video. Combine reconstruction error, trajectory deviation, and new-view semantic credibility (rated by Qwen3-VL) into a reward signal, which is fed back to the video model via Flow-GRPO (a reinforcement learning algorithm adapted for flow-matching models).

The base model is the open-source Wan 2.1 (1.3B and 14B). It is trained separately into World-R1-Small and World-R1-Large. The training data consists of only about 3,000 pure-text prompts generated by Gemini, with no use of any 3D assets. During training, every 100 steps, one round of “dynamic fine-tuning” is inserted: temporarily disabling 3D rewards and keeping only image-quality rewards. This prevents the model from suppressing non-rigid dynamics—such as character motion—in pursuit of geometric rigidity.

On 3D consistency metrics, World-R1-Large’s PSNR (3DPSNR) improves by 7.91dB over the base Wan 2.1 14B, and the Small version improves by 10.23dB. VBench general video quality does not decrease and instead improves.

In a blind test with 25 participants, the geometric consistency win rate is 92%, and overall preference is 86%. The code has been open-sourced on GitHub, licensed under CC BY-NC-SA 4.0. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments