A Novel Variational Lower Bound for Inverse Reinforcement Learning

Abstract

Inverse reinforcement learning (IRL) seeks to learn the reward function fromexpert trajectories, to understand the task for imitation or collaborationthereby removing the need for manual reward engineering. However, IRL in thecontext of large, high-dimensional problems with unknown dynamics has beenparticularly challenging. In this paper, we present a new Variational LowerBound for IRL (VLB-IRL), which is derived under the framework of aprobabilistic graphical model with an optimality node. Our methodsimultaneously learns the reward function and policy under the learned rewardfunction by maximizing the lower bound, which is equivalent to minimizing thereverse Kullback-Leibler divergence between an approximated distribution ofoptimality given the reward function and the true distribution of optimalitygiven trajectories. This leads to a new IRL method that learns a valid rewardfunction such that the policy under the learned reward achieves expert-levelperformance on several known domains. Importantly, the method outperforms theexisting state-of-the-art IRL algorithms on these domains by demonstratingbetter reward from the learned policy.

Quick Read (beta)

loading the full paper ...