On the Correctness and Sample Complexity of Inverse Reinforcement Learning

Abstract

Inverse reinforcement learning (IRL) is the problem of finding a rewardfunction that generates a given optimal policy for a given Markov DecisionProcess. This paper looks at an algorithmic-independent geometric analysis ofthe IRL problem with finite states and actions. A L1-regularized Support VectorMachine formulation of the IRL problem motivated by the geometric analysis isthen proposed with the basic objective of the inverse reinforcement problem inmind: to find a reward function that generates a specified optimal policy. Thepaper further analyzes the proposed formulation of inverse reinforcementlearning with $n$ states and $k$ actions, and shows a sample complexity of$O(n^2 \log (nk))$ for recovering a reward function that generates a policythat satisfies Bellman's optimality condition with respect to the truetransition probabilities.

Quick Read (beta)

loading the full paper ...