Abstract
Recent improvements in conditional generative modeling have made it possibleto generate high-quality images from language descriptions alone. Weinvestigate whether these methods can directly address the problem ofsequential decision-making. We view decision-making not through the lens ofreinforcement learning (RL), but rather through conditional generativemodeling. To our surprise, we find that our formulation leads to policies thatcan outperform existing offline RL approaches across standard benchmarks. Bymodeling a policy as a return-conditional diffusion model, we illustrate how wemay circumvent the need for dynamic programming and subsequently eliminate manyof the complexities that come with traditional offline RL. We furtherdemonstrate the advantages of modeling policies as conditional diffusion modelsby considering two other conditioning variables: constraints and skills.Conditioning on a single constraint or skill during training leads to behaviorsat test-time that can satisfy several constraints together or demonstrate acomposition of skills. Our results illustrate that conditional generativemodeling is a powerful tool for decision-making.