Diffusion-Reward Adversarial Imitation Learning

Abstract

Imitation learning aims to learn a policy from observing expertdemonstrations without access to reward signals from environments. Generativeadversarial imitation learning (GAIL) formulates imitation learning asadversarial learning, employing a generator policy learning to imitate expertbehaviors and discriminator learning to distinguish the expert demonstrationsfrom agent trajectories. Despite its encouraging results, GAIL training isoften brittle and unstable. Inspired by the recent dominance of diffusionmodels in generative modeling, we propose Diffusion-Reward AdversarialImitation Learning (DRAIL), which integrates a diffusion model into GAIL,aiming to yield more robust and smoother rewards for policy learning.Specifically, we propose a diffusion discriminative classifier to construct anenhanced discriminator, and design diffusion rewards based on the classifier'soutput for policy learning. Extensive experiments are conducted in navigation,manipulation, and locomotion, verifying DRAIL's effectiveness compared to priorimitation learning methods. Moreover, additional experimental resultsdemonstrate the generalizability and data efficiency of DRAIL. Visualizedlearned reward functions of GAIL and DRAIL suggest that DRAIL can produce morerobust and smoother rewards. Project page:https://nturobotlearninglab.github.io/DRAIL/

Quick Read (beta)

loading the full paper ...