Robot Learning from a Physical World Model

  • 2025-11-10 18:59:07
  • Jiageng Mao, Sicheng He, Hao-Ning Wu, Yang You, Shuyang Sun, Zhicheng Wang, Yanan Bao, Huizhong Chen, Leonidas Guibas, Vitor Guizilini, Howard Zhou, Yue Wang
  • 0

Abstract

We introduce PhysWorld, a framework that enables robot learning from videogeneration through physical world modeling. Recent video generation models cansynthesize photorealistic visual demonstrations from language commands andimages, offering a powerful yet underexplored source of training signals forrobotics. However, directly retargeting pixel motions from generated videos torobots neglects physics, often resulting in inaccurate manipulations. PhysWorldaddresses this limitation by coupling video generation with physical worldreconstruction. Given a single image and a task command, our method generatestask-conditioned videos and reconstructs the underlying physical world from thevideos, and the generated video motions are grounded into physically accurateactions through object-centric residual reinforcement learning with thephysical world model. This synergy transforms implicit visual guidance intophysically executable robotic trajectories, eliminating the need for real robotdata collection and enabling zero-shot generalizable robotic manipulation.Experiments on diverse real-world tasks demonstrate that PhysWorldsubstantially improves manipulation accuracy compared to previous approaches.Visit \href{https://pointscoder.github.io/PhysWorld_Web/}{the project webpage}for details.

 

Quick Read (beta)

loading the full paper ...