Alibaba launches Qwen-Robot three models! Robot navigation, control, and physics simulation all in one package

Alibaba Qwen Team Releases Qwen-Robot Suite, Comprising Three Fundamental Models for Navigation, Manipulation, and Physical World Simulation, Ranks First in Multiple Robot Benchmarks, Seen as the Android Moment in Robotics
(Background: Alibaba Qianwen Launches "Qwen3.7-Plus" Model with Significantly Lower Prices, but at the Cost of Not Releasing Weights)
(Additional Context: Meta Ventures into Humanoid Robots! Secretly Acquires AI Startup Assured Robot Intelligence, Betting on "Physical World Interaction" Leading to AGI)

Table of Contents

Toggle

  • Qwen-RobotNav: The Five-in-One Navigation Model
  • Qwen-RobotManip: Cross-Robot Manipulation
  • Qwen-RobotWorld: Language as a Universal Interface
  • How Does It Compare to Western Labs?

(Source: Decrypt, Qwen Official Blog)

Alibaba Qwen Team announced on Tuesday the release of Qwen-Robot Suite, a "Embodied Intelligence Full Stack" composed of three core models. Qwen-RobotNav handles movement and navigation, Qwen-RobotManip manages mechanical manipulation, and Qwen-RobotWorld simulates the physical environment. Each model operates independently, but together they form the "Android Moment" in robotics—an operating system, not hardware.

Qwen-RobotNav: The Five-in-One Navigation Model

The navigation model integrates instruction following, goal point navigation, object search, target tracking, and autonomous driving—each requiring different visual memory strategies. Most models focus on a single strategy, but Qwen-RobotNav offers a parameterizable interface: token budget, temporal decay, per-camera weight, allowing planners to reconfigure during execution.

Trained on 15.6 million samples, the model achieved a success rate of 76.5% on the VLN-CE RxR benchmark (visual and language navigation in real environments) and 90% on EVT-Bench (moving target tracking).

Qwen-RobotManip: Cross-Robot Manipulation

Different robots have vastly different action representations: Franka robotic arm uses joint angles, ALOHA dual-arm robot uses gripper position and orientation, humanoid robots use full-body coordinates. Alibaba synthesized approximately 38,100 hours of training data from open-source robot datasets and human videos, without relying on private data collection.

The model ranked first in the RoboChallenge Table30-v1 benchmark, surpassing previous methods by 20%.

Qwen-RobotWorld: Language as a Universal Interface

This is the most ambitious model—a language-conditioned video world model that uses natural language as a universal action interface. Commands like "Pick up the red cup and pour water into the flower" are applicable across grippers, autonomous vehicles, and navigation agents.

The embodied world knowledge corpus covers 8.6 million video-text pairs, 200 million frames, spanning manipulation (5.9 million samples, 1,300+ skills, 20+ forms), autonomous driving (Waymo, NVIDIA PhysicalAI-AD), indoor navigation, and cross-14-mechanism human-robot transfer. The model ranks first on EWMBench and DreamGen Bench benchmarks, and scores full marks in physics consistency tests including Newton's laws, conservation of mass, fluid dynamics, and gravity.

How Does It Compare to Western Labs?

Western labs like Google DeepMind, Nvidia, Figure, and Physical Intelligence are also pursuing similar goals, but mostly focus on navigation or manipulation rather than a unified, modular kit. Alibaba’s vertical integration from chips to applications means it controls the entire ecosystem, and all these models are open source.

However, developers caution that these are software models, not physical robots, and real-world deployment in household scenarios will still take several years. Alibaba has not yet announced pricing, timelines, or client lists beyond pilot programs.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned