Just as GPT can turn your prompts and work logs into reusable skills, it also distills a robot's repeated failures and fixes into experience that can be called upon later.

Except, instead of reviewing code, it reviews the robot's operational process.

Every time a robot performs a task, ASPIRE records processes like perception, navigation, grasping, collision detection, and motion planning.

The GPT/Claude model behind it acts like a researcher, determining where things went wrong in the task and iterating on the program. If it works, the distilled experience is written into a Skill.

This way, robots can continuously learn by writing code, reviewing execution trajectories, fixing programs, and accumulating skills.

And this isn't just about refining Skills from robot experience.

NVIDIA's robotics lead Jim Fan also stated that ASPIRE represents a brand-new continuous learning paradigm.

Where:

Training has shifted from gradient descent to continuous skill refinement;
The trained model no longer corresponds to just a set of floating-point weights, but to a continuously expanding library of sensorimotor skills;
Distributed training becomes a group of agents each practicing different skills, then aggregating their experience into a single skill library.

What's trained isn't necessarily weights

Although the introduction has already covered a lot, let's first take a moment to provide some background before discussing how it revolutionizes robot training paradigms.

ASPIRE stands for Agentic Skill Programming through Iterative Robot Exploration.

It enables a robot to execute tasks with code, review multimodal execution trajectories after failure, fix the program, and store the fixed experience in an ever-growing skills library.

These Skills, though essentially still a piece of context fed to a large model, encapsulate a proven Code Repair Pattern that tells the robot how to modify its control program when encountering certain types of problems.

For instance, when a robot tries to pick up a radio and has already identified the target but can't get close to it.

The agent can analyze that the cause isn't a recognition error, but that the target points given by the planner all fall within the collision buffer zones of obstacles.

Thus, based on this experience, ASPIRE summarizes a new Skill:

If such a planning failure occurs, try reapproaching the target from different angles, such as 45°, 90°, and 180°, until a collision-free path is found.

In the future, when encountering similar scenarios, whether the target is a radio, a microwave, or other furniture, this experience can be directly reused without retrying from scratch.

At this point, you might be curious. Shouldn't robot training involve data collection, gradient descent, model weights, real-robot data collection, and simulation-to-reality transfer?

Why has it suddenly become about accumulating skills?

First, let's talk about a recently popular paradigm, Code as Policy.

Unlike end-to-end policy models like VLA, Code as Policy doesn't directly output robot actions; instead, it has the large model write an executable robot control program.

This program can call perception modules, planning APIs, and control primitives, such as recognizing objects, planning paths, moving robotic arms, and executing grasps.

This way, robot behavior is no longer fully hidden in neural network weights but becomes executable operational code.

With code, it can now be inspected, modified, debugged, and further optimized by the incredibly powerful Agent models available today.

However, Code as Policy has traditionally had two problems.

First, when a robot fails, the system typically only knows "the task is not completed," but doesn't know whether the perception was wrong, the grasp was unstable, the path planning caused a collision, or the recovery action had an issue.

Second, and more critically, it doesn't remember.

Once a task is done, the fixes, recovery strategies, and prompt writing discovered during debugging are discarded, so when a similar problem arises next time, the entire process has to be repeated.

That's why Jim Fan says:

(With ASPIRE) when a robot completes its 100th task, it finally no longer knows nothing, as it did when completing its first task.

In essence, this entire process mirrors human robotics engineers:

When a robot program fails, the engineer replays the execution process, reviews the perception results, analyzes the motion trajectory, and determines whether the grasp was wrong, the planning was off, or a recovery action failed.

After fixing it, the engineer takes note of this experience. The next time they encounter objects on a table edge, drawer handles, or narrow space navigation, they won't start from scratch.

What ASPIRE does is delegate this experience accumulation mechanism to an agent. It doesn't just have the large model write robot code; it also has the large model repeatedly try, watch, and fix in the execution environment, ultimately distilling the verified fixes into Skills.

So, in ASPIRE, training is no longer just gradient descent.

The training process becomes Skill Refinement; the training output is not only model weights but also a Skills Library that continuously grows as the robot accumulates experience.

Three-stage pipeline

In the paper, this idea is implemented as a three-stage pipeline.

First is the robot execution engine.

After a traditional robot program fails, the system may only tell you the task isn't complete.

ASPIRE breaks down the failure, leaving input, output, visual evidence, and error logs for each perception, planning, grasping, and control call.

It's like when a human engineer debugs a robot by replaying videos, checking trajectories, and determining whether the perception was wrong or the grasp failed—ASPIRE delegates this process to a coding agent.

Next is the skill library. After fixing the program, the agent doesn't discard the experience; it distills it into reusable knowledge.

The official skill library shows very specific entries, such as how to write SAM3 text prompts, approaching objects on table edges from multiple angles, filtering false detections for drawer handles, and which motion primitive to use when pushing planar objects.

These aren't like traditional model weights; they're more like a robot programmer's notes on pitfalls.

Finally, there's evolutionary search.

An agent doesn't just follow a single fix path. The system generates multiple candidate control programs, runs them in the execution environment, and continues iterating based on surviving programs and failed trajectories.

In software engineering, coding agents have become accustomed to writing code, running tests, checking traces, and fixing bugs. What ASPIRE does is transplant this loop into the physical world.

Experimental validation

To validate this method, the paper tested it on three classic robot benchmarks: LIBERO-Pro, Robosuite, and BEHAVIOR-1K, covering generalized manipulation, contact-rich manipulation, and long-horizon household tasks.

Overall results were significantly better than previous Code as Policy methods.

For example, in the Bimanual Handover task in Robosuite, ASPIRE improved success rates from 20% to 92%.

In terms of generalization capability.

The study first accumulated a Skill Library on LIBERO-90, then directly transferred it to the never-before-seen LIBERO-Pro Long tasks without further training or updating the skill library.

Results showed that as the skill library grew richer, the robot's success rate on new tasks steadily increased, starting from nearly zero to eventually reaching 31%. In other words, the thicker the Skill Library, the less the robot resembles a novice.

Author introduction

At the end of the technical blog, NVIDIA also released the full list of authors.

It's still the familiar faces from the GEAR team: Jim Fan, Zhu Yukai, Guanzhi Wang, Shi Guanya, and others.

The top three authors contributed equally.

Among them, Runyu Lu is currently a second-year PhD student at the University of Michigan, interning at GEAR; Yuubo Wu is from the University of Illinois at Urbana-Champaign (UIUC); and Ethan Kou is from the University of California, Berkeley, and is currently an undergraduate student.

It's worth noting that just yesterday, NVIDIA also announced the expansion of its domestic robotics team recruitment, opening several positions in Beijing, Shanghai, and Shenzhen, covering embodied intelligence, simulation, robot deployment, and solution architecture.

Source: Quantum Bit

Risk Warning and Disclaimer

        Market risk: investment should be cautious. This article does not constitute personal investment advice, nor does it consider the specific investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their particular circumstances. Investment based on this article is at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateStocksTransferLive
132.83K Popularity
#
StrategyBuyback
1.11M Popularity
#
PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇨🇩
542.75K Popularity
#
TrumpDisclosesOver100MBTCETH
3.83M Popularity
#
SharplinkAdds10000ETH
55.48M Popularity

Pinned

Sitemap

The Skill Moment of Embodied Intelligence! NVIDIA Open-Sources Robot Skill Library. Jim Fan: The Paradigm Has Changed

What's trained isn't necessarily weights

Three-stage pipeline

Experimental validation

Author introduction

Trending Topics

GateStocksTransferLive

StrategyBuyback

PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇨🇩

TrumpDisclosesOver100MBTCETH

SharplinkAdds10000ETH

Pinned