Receding Horizon Inverse Reinforcement Learning

Abstract

Inverse reinforcement learning (IRL) seeks to infer a cost function thatexplains the underlying goals and preferences of expert demonstrations. Thispaper presents receding horizon inverse reinforcement learning (RHIRL), a newIRL algorithm for high-dimensional, noisy, continuous systems with black-boxdynamic models. RHIRL addresses two key challenges of IRL: scalability androbustness. To handle high-dimensional continuous systems, RHIRL matches theinduced optimal trajectories with expert demonstrations locally in a recedinghorizon manner and 'stitches' together the local solutions to learn the cost;it thereby avoids the 'curse of dimensionality'. This contrasts sharply withearlier algorithms that match with expert demonstrations globally over theentire high-dimensional state space. To be robust against imperfect expertdemonstrations and system control noise, RHIRL learns a state-dependent costfunction 'disentangled' from system dynamics under mild conditions. Experimentson benchmark tasks show that RHIRL outperforms several leading IRL algorithmsin most instances. We also prove that the cumulative error of RHIRL growslinearly with the task duration.

Quick Read (beta)

loading the full paper ...