Hybrid Inverse Reinforcement Learning

Abstract

The inverse reinforcement learning approach to imitation learning is adouble-edged sword. On the one hand, it can enable learning from a smallernumber of expert demonstrations with more robustness to error compounding thanbehavioral cloning approaches. On the other hand, it requires that the learnerrepeatedly solve a computationally expensive reinforcement learning (RL)problem. Often, much of this computation is wasted searching over policies verydissimilar to the expert's. In this work, we propose using hybrid RL --training on a mixture of online and expert data -- to curtail unnecessaryexploration. Intuitively, the expert data focuses the learner on good statesduring training, which reduces the amount of exploration required to compute astrong policy. Notably, such an approach doesn't need the ability to reset thelearner to arbitrary states in the environment, a requirement of prior work inefficient inverse RL. More formally, we derive a reduction from inverse RL toexpert-competitive RL (rather than globally optimal RL) that allows us todramatically reduce interaction during the inner policy search loop whilemaintaining the benefits of the IRL approach. This allows us to derive bothmodel-free and model-based hybrid inverse RL algorithms with strong policyperformance guarantees. Empirically, we find that our approaches aresignificantly more sample efficient than standard inverse RL and several otherbaselines on a suite of continuous control tasks.

Quick Read (beta)

loading the full paper ...