Alibaba Qwen Team Releases Qwen-Robot Suite, Comprising Three Fundamental Models for Navigation, Manipulation, and Physical World Simulation, Ranks First in Multiple Robot Benchmarks, Seen as the Android Moment in Robotics
(Background: Alibaba Qianwen Launches "Qwen3.7-Plus" Model with Significantly Lower Prices, but at the Cost of Not Releasing Weights)
(Additional Context: Meta Ventures into Humanoid Robots! Secretly Acquires AI Startup Assured Robot Intelligence, Betting on "Physical World Interaction" Leading to AGI)

Table of Contents

Toggle

Qwen-RobotNav: The Five-in-One Navigation Model
Qwen-RobotManip: Cross-Robot Manipulation
Qwen-RobotWorld: Language as a Universal Interface
How Does It Compare to Western Labs?

(Source: Decrypt, Qwen Official Blog)

Alibaba Qwen Team announced on Tuesday the release of Qwen-Robot Suite, a "Embodied Intelligence Full Stack" composed of three core models. Qwen-RobotNav handles movement and navigation, Qwen-RobotManip manages mechanical manipulation, and Qwen-RobotWorld simulates the physical environment. Each model operates independently, but together they form the "Android Moment" in robotics—an operating system, not hardware.

Qwen-RobotNav: The Five-in-One Navigation Model

The navigation model integrates instruction following, goal point navigation, object search, target tracking, and autonomous driving—each requiring different visual memory strategies. Most models focus on a single strategy, but Qwen-RobotNav offers a parameterizable interface: token budget, temporal decay, per-camera weight, allowing planners to reconfigure during execution.

Trained on 15.6 million samples, the model achieved a success rate of 76.5% on the VLN-CE RxR benchmark (visual and language navigation in real environments) and 90% on EVT-Bench (moving target tracking).

Qwen-RobotManip: Cross-Robot Manipulation

Different robots have vastly different action representations: Franka robotic arm uses joint angles, ALOHA dual-arm robot uses gripper position and orientation, humanoid robots use full-body coordinates. Alibaba synthesized approximately 38,100 hours of training data from open-source robot datasets and human videos, without relying on private data collection.

The model ranked first in the RoboChallenge Table30-v1 benchmark, surpassing previous methods by 20%.

Qwen-RobotWorld: Language as a Universal Interface

This is the most ambitious model—a language-conditioned video world model that uses natural language as a universal action interface. Commands like "Pick up the red cup and pour water into the flower" are applicable across grippers, autonomous vehicles, and navigation agents.

The embodied world knowledge corpus covers 8.6 million video-text pairs, 200 million frames, spanning manipulation (5.9 million samples, 1,300+ skills, 20+ forms), autonomous driving (Waymo, NVIDIA PhysicalAI-AD), indoor navigation, and cross-14-mechanism human-robot transfer. The model ranks first on EWMBench and DreamGen Bench benchmarks, and scores full marks in physics consistency tests including Newton's laws, conservation of mass, fluid dynamics, and gravity.

How Does It Compare to Western Labs?

Western labs like Google DeepMind, Nvidia, Figure, and Physical Intelligence are also pursuing similar goals, but mostly focus on navigation or manipulation rather than a unified, modular kit. Alibaba’s vertical integration from chips to applications means it controls the entire ecosystem, and all these models are open source.

However, developers caution that these are software models, not physical robots, and real-world deployment in household scenarios will still take several years. Alibaba has not yet announced pricing, timelines, or client lists beyond pilot programs.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
MyGateTradeStory
643.45K Popularity
#
TradFiCFDGoldMasters
2.05M Popularity
#
PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇭🇷
841.79K Popularity
#
GateSpotVolumeLeadsGlobalGrowth
68.4M Popularity
#
MarvellPlungesNearly10%
182.58K Popularity

Pinned

Sitemap

Alibaba launches Qwen-Robot three models! Robot navigation, control, and physics simulation all in one package

Qwen-RobotNav: The Five-in-One Navigation Model

Qwen-RobotManip: Cross-Robot Manipulation

Qwen-RobotWorld: Language as a Universal Interface

How Does It Compare to Western Labs?

Trending Topics

MyGateTradeStory

TradFiCFDGoldMasters

PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇭🇷

GateSpotVolumeLeadsGlobalGrowth

MarvellPlungesNearly10%

Pinned