RIZE: Regularized Imitation Learning via Distributional Reinforcement Learning

Abstract

We introduce a novel Inverse Reinforcement Learning (IRL) approach thatovercomes limitations of fixed reward assignments and constrained flexibilityin implicit reward regularization. By extending the Maximum Entropy IRLframework with a squared temporal-difference (TD) regularizer and adaptivetargets, dynamically adjusted during training, our method indirectly optimizesa reward function while incorporating reinforcement learning principles.Furthermore, we integrate distributional RL to capture richer returninformation. Our approach achieves state-of-the-art performance on challengingMuJoCo tasks, demonstrating expert-level results on the Humanoid task with only3 demonstrations. Extensive experiments and ablation studies validate theeffectiveness of our method, providing insights into adaptive targets andreward dynamics in imitation learning.

Quick Read (beta)

loading the full paper ...