Joint research by five universities enables digital humans to autonomously navigate 3D scenes using vision, with success rates approximately 30 percentage points higher than the optimal baseline.

robot
Abstract generation in progress

ME News Report, April 14 (UTC+8), according to 1M AI News monitoring, a joint team from Peking University, Carnegie Mellon University, Tongji University, University of California, Los Angeles, and University of Michigan released VGHuman on arXiv, an embodied AI framework that enables digital humans to autonomously navigate unfamiliar 3D environments solely based on visual perception. Previously, digital human systems generally relied on preset scripts or privileged state information for driving, while VGHuman’s starting point is to give digital humans real eyes, allowing them to see the way, plan, and act independently. The framework consists of two layers. The World Layer reconstructs a 3D Gaussian scene with semantic annotations and collision meshes from monocular video, with occlusion perception designed to enable recognition of small occluded objects even in complex outdoor environments. The Agent Layer equips the digital human with first-person RGB-D (color + depth) perception, generating planning through spatial perception visual cues and iterative reasoning, ultimately transforming into full-body motion sequences driven by a diffusion model to animate the character. In navigation benchmarks across 200 test scenes, at three difficulty levels—simple path, obstacle avoidance, and dynamic pedestrians—VGHuman’s task success rate exceeds that of the strongest baselines such as NaVILA, NaVid, and Uni-NaVid by about 30 percentage points, with collision rates being comparable or lower. The framework also supports various movement styles like running and jumping, as well as long-term planning for visiting multiple targets consecutively. The code and models are planned to be open-sourced, with a GitHub repository already established. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin