DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

Abstract

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standardapproach to RL due to its beneficial exploration properties. Traditionally,policies are parameterized using Gaussian distributions, which significantlylimits their representational capacity. Diffusion-based policies offer a moreexpressive alternative, yet integrating them into MaxEnt-RL poseschallenges-primarily due to the intractability of computing their marginalentropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL(DIME). \emph{DIME} leverages recent advances in approximate inference withdiffusion models to derive a lower bound on the maximum entropy objective.Additionally, we propose a policy iteration scheme that provably converges tothe optimal diffusion policy. Our method enables the use of expressivediffusion-based policies while retaining the principled exploration benefits ofMaxEnt-RL, significantly outperforming other diffusion-based methods onchallenging high-dimensional control benchmarks. It is also competitive withstate-of-the-art non-diffusion based RL methods while requiring feweralgorithmic design choices and smaller update-to-data ratios, reducingcomputational complexity.

Quick Read (beta)

loading the full paper ...