Alibaba introduces AI models for robot management - ForkLog

Tool_AI# Alibaba introduces AI models for robot control

Alibaba has introduced the Qwen-Robot Suite — a collection of AI models for robots and tasks in the physical environment: Qwen-RobotNav for navigation, Qwen-RobotManip for object manipulation, and Qwen-RobotWorld for scene development forecasting. The team described the project as a “full stack for embodied intelligence.”

📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence.

🧭 Qwen-RobotNav — the gateway to mobility.
• Unifies 5 navigation tasks in one model: instruction following, point-goal,… pic.twitter.com/noumjTtTeS

— Qwen (@Alibaba_Qwen) June 16, 2026

These are software models intended to help physical agents perceive their surroundings, plan actions, and execute natural-language commands. The Qwen-Robot Suite is already undergoing pilot testing with select corporate clients of Alibaba Cloud in the robotics sector.

Why Alibaba is bringing Qwen into the physical world

Large language and multimodal models can already work with text, images, video, and speech, but that’s not enough for robots. Physical agents need not only to understand the command, but also to translate it into motion, account for space, object properties, sensor limitations, and the consequences of actions.

Alibaba calls this direction physical AI, or “embodied AI.” In this approach, the model must work not only with digital data, but also with the physical environment: moving around, finding objects, controlling manipulators, and predicting what will happen after an action.

Qwen-RobotNav: five navigation tasks in one model

Qwen-RobotNav is responsible for navigation. The model combines five groups of tasks:

  • following instructions;
  • moving to a specified point;
  • object search;
  • tracking a target;
  • autonomous driving.

According to Alibaba, Qwen-RobotNav is built on Qwen3-VL and trained on 15.6 million samples related to route planning and visual-language reasoning.

The company reported a 76.5% success rate on VLN-CE RxR and 90% on EVT-Bench. Alibaba also clarified that the model can operate as a tool for larger agent-based systems: a high-level model plans the task, while Qwen-RobotNav handles the movement.

Source: Qwen. In Alibaba’s demonstrations, scenarios include searching for a lost item indoors or checking whether a specific object in a building is open. In such tasks, the robot must not only move, but also collect visual evidence and return an answer to the user.

Qwen-RobotManip: object actions

Qwen-RobotManip is designed for physical actions on objects. The model should help robots pick up, move, and place items, as well as transfer skills across different types of devices.

Source: Qwen-RobotManip. One of the key problems in robotics is that robots describe actions differently. A manipulator, a bimanual platform, a robot with a gripper, or a mobile system uses different coordinates, joints, and command formats. Qwen-RobotManip attempts to bring this data into a common representation so that training on one type of robot can help others.

For training, Alibaba used more than 38,100 hours of data. This included 11,320 hours of open robotics data, 1,933 hours of first-person human action video, and 24,808 hours of synthetic robotic demonstrations created based on such videos.

The company claimed that the model ranked first in RoboChallenge Table30 v1 in the universal models track. According to Alibaba, Qwen-RobotManip also demonstrated robustness to new instructions, unfamiliar objects, and skill transfer across different robots.

Qwen-RobotWorld: a world model for robots

Qwen-RobotWorld is a video-based world model controlled by natural language. It is designed to predict how a scene will unfold after a given action.

Source: Qwen-RobotWorld. For example, the model receives the current observation and a text command, and then generates a probable future state of the environment. This approach can be used for manipulation, autonomous driving, navigation, planning, and creating synthetic training data for robots.

To train Qwen-RobotWorld, the team assembled the Embodied World Knowledge corpus. It includes 8.6 million “video-text” pairs and more than 200 million frames, covering more than 20 types of robotic platforms and over 500 categories of actions.

Alibaba stated that Qwen-RobotWorld ranked first in EWMBench and DreamGen Bench, and also outperformed all open models in WorldModelBench and PBench. The technical write-up also claims that the model shows high consistency with fundamental physical patterns—motion, mass conservation, fluids, and gravity.

Still a long way from mass-market robots

Despite the claimed results, the Qwen-Robot Suite is still a set of models rather than a ready consumer robotics platform. Real-world deployment faces issues such as sensor noise, actuator wear, unusual situations, perception errors, and a huge number of rare scenarios. Many benchmarks used to compare such systems are conducted in simulation or under limited experimental conditions.

Alibaba also has not disclosed the access cost, the public launch timeline, or the list of clients that are already testing the Qwen-Robot Suite.

Recall that in April, Alibaba Cloud introduced the agent model Qwen3.6-Plus with a 1 million token context window and support for external tools.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned