World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation

Abstract

Robotic manipulation policies are commonly initialized through imitationlearning, but their performance is limited by the scarcity and narrow coverageof expert data. Reinforcement learning can refine polices to alleviate thislimitation, yet real-robot training is costly and unsafe, while training insimulators suffers from the sim-to-real gap. Recent advances in generativemodels have demonstrated remarkable capabilities in real-world simulation, withdiffusion models in particular excelling at generation. This raises thequestion of how diffusion model-based world models can be combined to enhancepre-trained policies in robotic manipulation. In this work, we proposeWorld4RL, a framework that employs diffusion-based world models ashigh-fidelity simulators to refine pre-trained policies entirely in imaginedenvironments for robotic manipulation. Unlike prior works that primarily employworld models for planning, our framework enables direct end-to-end policyoptimization. World4RL is designed around two principles: pre-training adiffusion world model that captures diverse dynamics on multi-task datasets andrefining policies entirely within a frozen world model to avoid onlinereal-world interactions. We further design a two-hot action encoding schemetailored for robotic manipulation and adopt diffusion backbones to improvemodeling fidelity. Extensive simulation and real-world experiments demonstratethat World4RL provides high-fidelity environment modeling and enablesconsistent policy refinement, yielding significantly higher success ratescompared to imitation learning and other baselines. More visualization resultsare available at https://world4rl.github.io/.

Quick Read (beta)

loading the full paper ...