Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

  • 2025-08-07 17:59:44
  • Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, Liliang Chen, Shuicheng Yan, Maoqing Yao, Guanghui Ren
  • 0

Abstract

We introduce Genie Envisioner (GE), a unified world foundation platform forrobotic manipulation that integrates policy learning, evaluation, andsimulation within a single video-generative framework. At its core, GE-Base isa large-scale, instruction-conditioned video diffusion model that captures thespatial, temporal, and semantic dynamics of real-world robotic interactions ina structured latent space. Built upon this foundation, GE-Act maps latentrepresentations to executable action trajectories through a lightweight,flow-matching decoder, enabling precise and generalizable policy inferenceacross diverse embodiments with minimal supervision. To support scalableevaluation and training, GE-Sim serves as an action-conditioned neuralsimulator, producing high-fidelity rollouts for closed-loop policy development.The platform is further equipped with EWMBench, a standardized benchmark suitemeasuring visual fidelity, physical consistency, and instruction-actionalignment. Together, these components establish Genie Envisioner as a scalableand practical foundation for instruction-driven, general-purpose embodiedintelligence. All code, models, and benchmarks will be released publicly.

 

Quick Read (beta)

loading the full paper ...