Unitree validates a new trend: The core battlefield of embodied intelligence is not just models.

robot
Abstract generation in progress

The competition in embodied intelligence is entering a new phase. With Unitree Technology releasing the WVLA 2.0 embodied large model and completing a real-world demonstration without remote control, the industry is increasingly realizing that the core barrier in this race is not simply model scale, but full-stack capabilities encompassing low-latency architecture design, hardware-software co-integration, and accumulation of embodied data.

According to a research report published by Nomura International on June 28, analysts conducted an on-site visit to Unitree on June 15. During the demonstration, the G1 robot, equipped with WVLA 2.0 (World-model Vision-Language-Action), autonomously completed six consecutive tasks in a conference room with interference, without any remote control operation. The inference loop took about 90ms, equivalent to approximately ten iterations per second. This is the first version with commercial deployment potential after two years of research and development by Unitree. Management identified industrial manufacturing—joint motor assembly, loading/unloading, and fixture handling—as the earliest commercial landing scenarios, and considered large-scale physical operation data from global robot fleets as core assets.

The Nomura report also outlined NXP's NeuralAxis architecture framework, released at COMPUTEX 2026. The framework, spearheaded by NXP President and CEO Rafael Sotomayor, aligns closely with Unitree's engineering approach—the true bottleneck for physical AI is not the inference scale of language models, but the ability to build edge control layers with latency as low as 40ms, akin to the human spinal reflex.

The direct implication for investors is: The competitive landscape of embodied intelligence is evolving from 'whose model is stronger' to 'whose system is more complete.' Unitree's moat, built on full-stack self-developed integration combined with embodied data advantages, is difficult for pure cloud model suppliers to replicate.

NeuralAxis: Redefining the Architectural Boundaries of Physical AI Systems

NXP's NeuralAxis (neural axis architecture) framework, inspired by the human nervous system, decomposes physical AI control logic into three decoupled yet coordinated layers: the reasoning layer (latency ~300ms) corresponding to the cerebral cortex, the coordination layer (responsible for motion control and balance) corresponding to the cerebellum, and the reflex layer corresponding to the spinal cord—latency as low as 40ms, deployed at the edge near the actuator.

For humanoid robots, the implications of this framework are the most profound.

NeuralAxis advocates replacing a centralized "central brain" with distributed reflex processors—deploying local autonomous decision-making capabilities at joints, hands, and feet to enable local execution of actions like grip control and ankle balance, and to achieve chain recovery of balance, grasping, posture, and gait within 40ms. The decoupling of inference and motion control also allows for continuous addition of new skills while maintaining motion stability.

The commercial extension of this framework is also noteworthy. Nomura's industry research indicates that compared to traditional automation solutions, the NeuralAxis architecture can bring significant manufacturing efficiency improvements, and diagnostic robot sales are expected to grow significantly. In addition, the same architecture can compress end-to-end latency for drones to below 20ms and layer the control logic of software-defined vehicles into reasoning, coordination, and safety-critical execution domains.

WVLA 2.0: A Path to Integration of Model Fusion and Hardware-Software Synergy

Unitree's technical route with WVLA 2.0 reveals a clear divergence from mainstream approaches.

Most similar solutions bet on pure VLA (Vision-Language-Action) end-to-end generation. In contrast, WVLA 2.0 integrates the predictive capabilities of the WMA (World-Model Action) model with the action generation of VLA, achieving comprehensive upgrades in high-level task understanding, 2D/3D spatial semantic reasoning, dynamics-constrained action generation, and anti-interference capability.

At the perception level, the system integrates four parallel visual streams: one RealSense depth camera, one Livox MID360 LiDAR, and two side-facing cameras, constructing a 360-degree spatial representation. Position update latency under interference conditions is controlled within 10ms. In terms of hardware-software co-design, the action parameters after inference are sent via the CAN bus to the G1's 23 degrees of freedom joints. With Unitree's self-developed "cerebellum" motion control module, the positioning error for grasping objects under 2kg with a single arm can be controlled within 5mm.

At the computing architecture level, WVLA 2.0 compresses edge computing power to below 100 TOPS, fully running on the NVIDIA (NVDA US, unrated) Jetson Orin NX equipped with G1 EDU, without relying on the cloud. Management stated that this design avoids the risk of task interruption due to network latency or disconnection.

Data Paradigm Shift: "No-Body Data Collection" Becomes Mainstream

The shift in data collection mode is another significant signal from this report.

Unitree's demonstration shows that in a single recording session without remote operation intervention, the G1 can autonomously complete multiple consecutive tasks in an environment with interference, indicating that "no-body data collection" is becoming the mainstream paradigm for embodied intelligence data production. That is, robots rely on their own perception and decision-making to accumulate data, rather than relying on manual remote operation labeling.

Nomura's industry research also highlights current limitations: The system still has blind spots and rear perception gaps, execution speed is relatively slow, fine manipulation accuracy is insufficient, and there is a lack of quantified baseline data for continuous success rates. These shortcomings also define the priority boundaries for near-term commercial deployment.

Based on this, management has formulated a phased landing roadmap: Industrial manufacturing (joint motor assembly, loading/unloading, fixture handling) is listed as the earliest landing point because Unitree's own factory provides a data loop; followed by logistics sorting and flexible 3C assembly; home and medical care scenarios, due to the significantly higher difficulty of open unstructured environments, are listed as longer-term goals.

Full-Stack Integration: Two Dimensions of Unitree's Differentiation Barrier

The core conclusion of the Nomura report boils down to a judgment: In the commercialization of embodied intelligence, model capabilities are certainly important, but they are not the sole decisive variable.

Unitree management defines the company's differentiated competitiveness in two dimensions: First, full-stack self-developed integration capability from perception and model to motion control; second, large-scale physical operation data accumulated from global robot fleets. These two assets reinforce each other—self-developed hardware generates exclusive data, and data feeds back into model iteration, forming a closed loop that is difficult for cloud model suppliers to penetrate.

From the perspective of market competition, the landing logic of the NeuralAxis framework and WVLA 2.0 points to the same conclusion: The core battlefield of embodied intelligence is unfolding simultaneously at the system architecture layer and the data layer. For investors, the dimensions for evaluating participants in the track need to extend from a single "model capability" to a more complete system integration capability and the scale of embodied data accumulation.


The above exciting content comes from the Chasing Wind Trading Desk.

For more detailed interpretations, including real-time analysis and frontline research, please join [**Chasing Wind Trading Desk ▪ Annual Membership**]

![](https://img-cdn.gateio.im/social/moments-f187f887a1-07b7e18c28-8b7abd-62a40f)

Risk Warning and Disclaimer

          

            The market carries risks; investment requires caution. This article does not constitute personal investment advice and has not considered the specific investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their particular circumstances. Investment based on this is at your own risk.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned