Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP

Abstract

Multi-task reinforcement learning is a rich paradigm where information frompreviously seen environments can be leveraged for better performance andimproved sample-efficiency in new environments. In this work, we leverage ideasof common structure underlying a family of Markov decision processes (MDPs) toimprove performance in the few-shot regime. We use assumptions of structurefrom Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP,and approach for learning a common representation and universal dynamics model.To this end, we provide transfer and generalization bounds based on task andstate similarity, along with sample complexity bounds that depend on theaggregate number of samples across tasks, rather than the number of tasks, asignificant improvement over prior work. To demonstrate the efficacy of theproposed method, we empirically compare and show improvements against othermulti-task and meta-reinforcement learning baselines.

Quick Read (beta)

loading the full paper ...