Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning

Abstract

While Reinforcement Learning (RL) aims to train an agent from a rewardfunction in a given environment, Inverse Reinforcement Learning (IRL) seeks torecover the reward function from observing an expert's behavior. It is wellknown that, in general, various reward functions can lead to the same optimalpolicy, and hence, IRL is ill-defined. However, (Cao et al., 2021) showed that,if we observe two or more experts with different discount factors or acting indifferent environments, the reward function can under certain conditions beidentified up to a constant. This work starts by showing an equivalentidentifiability statement from multiple experts in tabular MDPs based on a rankcondition, which is easily verifiable and is shown to be also necessary. Wethen extend our result to various different scenarios, i.e., we characterizereward identifiability in the case where the reward function can be representedas a linear combination of given features, making it more interpretable, orwhen we have access to approximate transition matrices. Even when the reward isnot identifiable, we provide conditions characterizing when data on multipleexperts in a given environment allows to generalize and train an optimal agentin a new environment. Our theoretical results on reward identifiability andgeneralizability are validated in various numerical experiments.

Quick Read (beta)

loading the full paper ...