From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance

  • 2025-10-16 17:57:47
  • Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Yibo Peng, Tao Huang, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang, Chang Xu
  • 0

Abstract

Natural language offers a natural interface for humanoid robots, but existinglanguage-guided humanoid locomotion pipelines remain cumbersome and unreliable.They typically decode human motion, retarget it to robot morphology, and thentrack it with a physics-based controller. However, this multi-stage process isprone to cumulative errors, introduces high latency, and yields weak couplingbetween semantics and control. These limitations call for a more direct pathwayfrom language to action, one that eliminates fragile intermediate stages.Therefore, we present RoboGhost, a retargeting-free framework that directlyconditions humanoid policies on language-grounded motion latents. By bypassingexplicit motion decoding and retargeting, RoboGhost enables a diffusion-basedpolicy to denoise executable actions directly from noise, preserving semanticintent and supporting fast, reactive control. A hybrid causaltransformer-diffusion motion generator further ensures long-horizon consistencywhile maintaining stability and diversity, yielding rich latent representationsfor precise humanoid behavior. Extensive experiments demonstrate that RoboGhostsubstantially reduces deployment latency, improves success rates and trackingaccuracy, and produces smooth, semantically aligned locomotion on realhumanoids. Beyond text, the framework naturally extends to other modalitiessuch as images, audio, and music, providing a general foundation forvision-language-action humanoid systems.

 

Quick Read (beta)

loading the full paper ...