Inverse Reinforcement Learning without Reinforcement Learning

Abstract

Inverse Reinforcement Learning (IRL) is a powerful set of techniques forimitation learning that aims to learn a reward function that rationalizesexpert demonstrations. Unfortunately, traditional IRL methods suffer from acomputational weakness: they require repeatedly solving a hard reinforcementlearning (RL) problem as a subroutine. This is counter-intuitive from theviewpoint of reductions: we have reduced the easier problem of imitationlearning to repeatedly solving the harder problem of RL. Another thread of workhas proved that access to the side-information of the distribution of stateswhere a strong policy spends time can dramatically reduce the sample andcomputational complexities of solving an RL problem. In this work, wedemonstrate for the first time a more informed imitation learning reductionwhere we utilize the state distribution of the expert to alleviate the globalexploration component of the RL subroutine, providing an exponential speedup intheory. In practice, we find that we are able to significantly speed up theprior art on continuous control tasks.

Quick Read (beta)

loading the full paper ...