The viral robot dance moves: a new solution to the industry’s "neck-breaking" problem in embodied intelligence

MaticHoleFiller · 2026-04-03T19:25:48+00:00

🤖 Header image generated by Zhixiang Future AI Large ModelHow to solve data scarcity? Zhixiang Future provides an answer through a collaboration of “real + generated.”Author | Wang HanEditor | Mo YingOn concert and major event stages, robot dance troupes amaze audiences with their synchronized and precisely timed moves. This level of uniformity is not only a victory of hardware but also the result of “well-trained” systems.One of the core challenges in embodied intelligence training is how to enable models to learn real interaction capabilities that conform to physical laws in a virtual environment. This has become a significant barrier for many companies entering the embodied intelligence field.Recently, Zhixiang Future (HiDream), a domestic AI company focused on AIGC video large models and applications, has chosen to collaborate with Noitom Robotics, a company specializing in embodied intelligence.

MaticHoleFiller

2026-04-03 19:25:48

🤖The hero image is created by

Zhixiang Future AI foundation model

How can we address data scarcity and hunger? Zhixiang Future provides the answer through a “real + generated” collaborative approach.

By | Wang Han

Edited by | Mò Yǐng

On the stages of concerts and major galas, robot dance troupes dazzle the audience with synchronized, precisely timed choreography. This kind of uniformity is not only a triumph of hardware, but also the result of “well-trained” execution.

One of the core challenges in training embodied intelligence is how to enable a model, in a virtual environment, to learn genuine interaction capabilities that comply with physical laws—this has also become a threshold that many companies find difficult to overcome when deploying embodied intelligence.

Recently, the domestic AI company Zhixiang Future (HiDream), which focuses on AIGC video foundation models and applications, chose to launch a strategic cooperation with the embodied intelligence robot company Noitom Robotics. Through a method of “real data + virtual augmentation,” they provide the industry with high-quality, scalable embodied training data.

This cross-industry collaboration model also offers a completely new way of thinking to break through the industry’s bottlenecks.

Noitom Robotics provides real data seeds

Zhixiang Future amplifies with generative models by 100x

The core of this cooperation lies in combining real data with generative technology, as well as leveraging the complementary technical advantages of both sides.

The value of real data lies in its irreplaceable physical relatedness, which is the key prerequisite for ensuring the model is closely aligned with reality. Meanwhile, the core value of generative technology lies in breaking the visual illusion inherent in real-data collection, as well as overcoming the limitations of scale and diversity.

As a builder of the embodied intelligence data foundation, Noitom Robotics, relying on high-precision human motion capture and multimodal data collection infrastructure, provides the partnership with precise human motion data “seeds” from the real world.

These data come from real interactions in the physical world, featuring authentic and reliable physical feedback—laying a foundation of real physical laws for model training.

Zhixiang Future leverages its multimodal foundation model’s millimeter-level, highly controllable video generation capability, playing the role of a “data alchemist.”

By performing fine-grained amplification of more than 100x on the multimodal, human-centric data high-precisionly captured by Noitom Robotics, and expanding and generalizing across visually diverse scenes, Zhixiang Future deeply integrates precise motion instructions with rich visual elements. This not only achieves exponential growth in data scale, but also ensures that each generated video frame is accurately paired with the underlying motion data.

▲ Left: Original data collection scene of Noitom Robotics | Right: Generative processing results by Zhixiang Future

One of the parties’ multiple in-depth technical collaborations is using video generation technology to remove vision gap and visual interference items in the data.

Under Li Feifei’s “three-layer pyramid”:

Two major hurdles in real data collection

Why undertake this kind of cooperation? To answer that, you first need to clarify what kind of predicament embodied intelligence data is facing at present.

“AI mother” Li Feifei proposed the concept of a “three-layer data pyramid for embodied data”: the bottom layer is network data and human video; the middle layer is simulated and synthetic data; and the top layer is real robot data.

For the top layer and the bottom layer, the industry has already tried many approaches, and has found two of the most severe problems:

On the one hand, there is an inherent contradiction between the cost of collecting real data and the visual generalization capability required by models. At its core, this contradiction is an imbalance between efficiency and quality—an industry pain point that has long been difficult to reconcile.

Standardized environments can significantly improve collection efficiency and reduce unit data costs. But to enhance a model’s visual generalization capability, it requires diverse environments and diverse object distributions, covering a complex real world in order to cope with uncertainties in the environment.

On the other hand, during the collection of high-precision, multimodal data, wearing various optical systems, inertial action capture systems, and tactile capture devices will interfere with human body shape, occlusion relationships, and the overall visual distribution, forming a clear “Vision Gap.”

If the collected images are repaired in post-processing, although local regions can be filled in, the results are usually far from satisfactory and cannot meet the data quality requirements for training embodied intelligence models.

This further restricts the application of real data.

Exploring the third paradigm of data production

Tens of thousands of hours of data are already on the way

The cooperation between Zhixiang Future and Noitom Robotics is precisely a targeted breakthrough aimed at the industry pain points above, creating an entirely new data production paradigm: real collection + generative foundation model collaboration.

This paradigm not only avoids the shortcomings of a single data type, but also achieves complementary advantages between the two: it preserves the physical consistency of real data while breaking through the limitations of traditional collection methods in scene diversity and scale.

The parties’ collaborative trial experiments show that Zhixiang Future’s generative model performs exceptionally well in eliminating the visual gap. It can effectively fill visual shortcomings in real collection data and generate high-fidelity training data that follows physical laws.

Through this technical path of “generative removal of Vision Gap,” both sides successfully met the requirements for precision and reasonableness of training data, and can scale up to produce training data that is both real and diverse.

This provides abundant “fuel” for training a “world model” that can truly understand the physical world. The cooperating parties expect that within the year, the embodied intelligence video data generated through their collaboration will reach more than tens of thousands of hours.

Conclusion: Embodied intelligence enters the “mixed data” era

In 2026, which the industry views as embodied intelligence’s “data year zero,” this judgment is far from baseless.

In the past few years, the industry has been oscillating between “pure real data collection” and “pure virtual simulation,” and the ceiling of each approach is already evident. Real data has high precision but steep costs and limited scenarios. Simulated data has a large scale but questionable physical authenticity, making it difficult to cross the “gap from simulation to reality.” More and more people in the industry are starting to realize that whether relying solely on real collection or solely on virtual simulation, they can’t go far.

The cooperation between Zhixiang Future and Noitom Robotics is precisely landing on this turning point.

Now, the cooperation between Zhixiang Future and Noitom Robotics provides the industry with a third route—an integrated paradigm of “real data + generative expansion”—which is expected to become a new industry standard for fundamental infrastructure.

Looking across the entire industry, this hybrid approach of “real capture data + generative foundation model enhancement to expand capacity” is becoming the choice for more and more companies. Embodied intelligence is entering the “mixed data” era.

For massive information and precise analysis, all on the Sina Finance APP

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.