Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

Abstract

A significant challenge for the practical application of reinforcementlearning in the real world is the need to specify an oracle reward functionthat correctly defines a task. Inverse reinforcement learning (IRL) seeks toavoid this challenge by instead inferring a reward function from expertbehavior. While appealing, it can be impractically expensive to collectdatasets of demonstrations that cover the variation common in the real world(e.g. opening any type of door). Thus in practice, IRL must commonly beperformed with only a limited set of demonstrations where it can be exceedinglydifficult to unambiguously recover a reward function. In this work, we exploitthe insight that demonstrations from other tasks can be used to constrain theset of possible reward functions by learning a "prior" that is specificallyoptimized for the ability to infer expressive reward functions from limitednumbers of demonstrations. We demonstrate that our method can efficientlyrecover rewards from images for novel tasks and provide intuition as to how ourapproach is analogous to learning a prior.

Quick Read (beta)

loading the full paper ...