Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Abstract

Providing a suitable reward function to reinforcement learning can bedifficult in many real world applications. While inverse reinforcement learning(IRL) holds promise for automatically learning reward functions fromdemonstrations, several major challenges remain. First, existing IRL methodslearn reward functions from scratch, requiring large numbers of demonstrationsto correctly infer the reward for each task the agent may need to perform.Second, existing methods typically assume homogeneous demonstrations for asingle behavior or task, while in practice, it might be easier to collectdatasets of heterogeneous but related behaviors. To this end, we propose a deeplatent variable model that is capable of learning rewards from demonstrationsof distinct but related tasks in an unsupervised way. Critically, our model caninfer rewards for new, structurally-similar tasks from a single demonstration.Our experiments on multiple continuous control tasks demonstrate theeffectiveness of our approach compared to state-of-the-art imitation andinverse reinforcement learning methods.

Quick Read (beta)

loading the full paper ...