Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory

Abstract

We consider inverse reinforcement learning problems with concave utilities.Concave Utility Reinforcement Learning (CURL) is a generalisation of thestandard RL objective, which employs a concave function of the state occupancymeasure, rather than a linear function. CURL has garnered recent attention forits ability to represent instances of many important applications including thestandard RL such as imitation learning, pure exploration, constrained MDPs,offline RL, human-regularized RL, and others. Inverse reinforcement learning isa powerful paradigm that focuses on recovering an unknown reward function thatcan rationalize the observed behaviour of an agent. There has been recenttheoretical advances in inverse RL where the problem is formulated asidentifying the set of feasible reward functions. However, inverse RL for CURLproblems has not been considered previously. In this paper we show that most ofthe standard IRL results do not apply to CURL in general, since CURLinvalidates the classical Bellman equations. This calls for a new theoreticalframework for the inverse CURL problem. Using a recent equivalence resultbetween CURL and Mean-field Games, we propose a new definition for the feasiblerewards for I-CURL by proving that this problem is equivalent to an inversegame theory problem in a subclass of mean-field games. We present initial queryand sample complexity results for the I-CURL problem under assumptions such asLipschitz-continuity. Finally, we outline future directions and applications inhuman--AI collaboration enabled by our results.

Quick Read (beta)

loading the full paper ...