Abstract
Sim2real for robotic manipulation is difficult due to the challenges ofsimulating complex contacts and generating realistic task distributions. Totackle the latter problem, we introduce ManipGen, which leverages a new classof policies for sim2real transfer: local policies. Locality enables a varietyof appealing properties including invariances to absolute robot and objectpose, skill ordering, and global scene configuration. We combine these policieswith foundation models for vision, language and motion planning and demonstrateSOTA zero-shot performance of our method to Robosuite benchmark tasks insimulation (97%). We transfer our local policies from simulation to reality andobserve they can solve unseen long-horizon manipulation tasks with up to 8stages with significant pose, object and scene configuration variation.ManipGen outperforms SOTA approaches such as SayCan, OpenVLA, LLMTrajGen andVoxPoser across 50 real-world manipulation tasks by 36%, 76%, 62% and 60%respectively. Video results at https://mihdalal.github.io/manipgen/