From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance

  • 2025-10-17 16:37:59
  • Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Yibo Peng, Tao Huang, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang, Chang Xu
  • 0

Abstract

Natural language offers a natural interface for humanoid robots, but existinglanguage-guided humanoid locomotion pipelines remain cumbersome anduntrustworthy. They typically decode human motion, retarget it to robotmorphology, and then track it with a physics-based controller. However, thismulti-stage process is prone to cumulative errors, introduces high latency, andyields weak coupling between semantics and control. These limitations call fora more direct pathway from language to action, one that eliminates fragileintermediate stages. Therefore, we present RoboGhost, a retargeting-freeframework that directly conditions humanoid policies on language-groundedmotion latents. By bypassing explicit motion decoding and retargeting,RoboGhost enables a diffusion-based policy to denoise executable actionsdirectly from noise, preserving semantic intent and supporting fast, reactivecontrol. A hybrid causal transformer-diffusion motion generator further ensureslong-horizon consistency while maintaining stability and diversity, yieldingrich latent representations for precise humanoid behavior. Extensiveexperiments demonstrate that RoboGhost substantially reduces deploymentlatency, improves success rates and tracking precision, and produces smooth,semantically aligned locomotion on real humanoids. Beyond text, the frameworknaturally extends to other modalities such as images, audio, and music,providing a universal foundation for vision-language-action humanoid systems.

 

Quick Read (beta)

loading the full paper ...