Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

Abstract

We present a maximum entropy inverse reinforcement learning (IRL) approachfor improving the sample quality of diffusion generative models, especiallywhen the number of generation time steps is small. Similar to how IRL trains apolicy based on the reward function learned from expert demonstrations, wetrain (or fine-tune) a diffusion model using the log probability densityestimated from training data. Since we employ an energy-based model (EBM) torepresent the log density, our approach boils down to the joint training of adiffusion model and an EBM. Our IRL formulation, named Diffusion by MaximumEntropy IRL (DxMI), is a minimax problem that reaches equilibrium when bothmodels converge to the data distribution. The entropy maximization plays a keyrole in DxMI, facilitating the exploration of the diffusion model and ensuringthe convergence of the EBM. We also propose Diffusion by Dynamic Programming(DxDP), a novel reinforcement learning algorithm for diffusion models, as asubroutine in DxMI. DxDP makes the diffusion model update in DxMI efficient bytransforming the original problem into an optimal control formulation wherevalue functions replace back-propagation in time. Our empirical studies showthat diffusion models fine-tuned using DxMI can generate high-quality samplesin as few as 4 and 10 steps. Additionally, DxMI enables the training of an EBMwithout MCMC, stabilizing EBM training dynamics and enhancing anomaly detectionperformance.

Quick Read (beta)

loading the full paper ...