Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

Abstract

Inverse reinforcement learning (IRL) aims to infer a reward from expertdemonstrations, motivated by the idea that the reward, rather than the policy,is the most succinct and transferable description of a task [Ng et al., 2000].However, the reward corresponding to an optimal policy is not unique, making itunclear if an IRL-learned reward is transferable to new transition laws in thesense that its optimal policy aligns with the optimal policy corresponding tothe expert's true reward. Past work has addressed this problem only under theassumption of full access to the expert's policy, guaranteeing transferabilitywhen learning from two experts with the same reward but different transitionlaws that satisfy a specific rank condition [Rolland et al., 2022]. In thiswork, we show that the conditions developed under full access to the expert'spolicy cannot guarantee transferability in the more practical scenario where wehave access only to demonstrations of the expert. Instead of a binary rankcondition, we propose principal angles as a more refined measure of similarityand dissimilarity between transition laws. Based on this, we then establish twokey results: 1) a sufficient condition for transferability to any transitionlaws when learning from at least two experts with sufficiently differenttransition laws, and 2) a sufficient condition for transferability to localchanges in the transition law when learning from a single expert. Furthermore,we also provide a probably approximately correct (PAC) algorithm and anend-to-end analysis for learning transferable rewards from demonstrations ofmultiple experts.

Quick Read (beta)

loading the full paper ...