On the Effective Horizon of Inverse Reinforcement Learning

Abstract

Inverse reinforcement learning (IRL) algorithms often rely on (forward)reinforcement learning or planning over a given time horizon to compute anapproximately optimal policy for a hypothesized reward function and then matchthis policy with expert demonstrations. The time horizon plays a critical rolein determining both the accuracy of reward estimates and the computationalefficiency of IRL algorithms. Interestingly, an \emph{effective time horizon}shorter than the ground-truth value often produces better results faster. Thiswork formally analyzes this phenomenon and provides an explanation: the timehorizon controls the complexity of an induced policy class and mitigatesoverfitting with limited data. This analysis serves as a guide for theprincipled choice of the effective horizon for IRL. It also prompts us tore-examine the classic IRL formulation: it is more natural to learn jointly thereward and the effective horizon rather than the reward alone with a givenhorizon. To validate our findings, we implement a cross-validation extensionand the experimental results confirm the theoretical analysis.

Quick Read (beta)

loading the full paper ...