Versatile Inverse Reinforcement Learning via Cumulative Rewards

Abstract

Inverse Reinforcement Learning infers a reward function from expertdemonstrations, aiming to encode the behavior and intentions of the expert.Current approaches usually do this with generative and uni-modal models,meaning that they encode a single behavior. In the common setting, where thereare various solutions to a problem and the experts show versatile behavior thisseverely limits the generalization capabilities of these methods. We propose anovel method for Inverse Reinforcement Learning that overcomes these problemsby formulating the recovered reward as a sum of iteratively traineddiscriminators. We show on simulated tasks that our approach is able to recovergeneral, high-quality reward functions and produces policies of the samequality as behavioral cloning approaches designed for versatile behavior.

Quick Read (beta)

loading the full paper ...