Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Abstract

As a marriage between offline RL and meta-RL, the advent of offlinemeta-reinforcement learning (OMRL) has shown great promise in enabling RLagents to multi-task and quickly adapt while acquiring knowledge safely. Amongwhich, context-based OMRL (COMRL) as a popular paradigm, aims to learn auniversal policy conditioned on effective task representations. In this work,by examining several key milestones in the field of COMRL, we propose tointegrate these seemingly independent methodologies into a unified framework.Most importantly, we show that the pre-existing COMRL algorithms areessentially optimizing the same mutual information objective between the taskvariable $M$ and its latent representation $Z$ by implementing variousapproximate bounds. Such theoretical insight offers ample design freedom fornovel algorithms. As demonstrations, we propose a supervised and aself-supervised implementation of $I(Z; M)$, and empirically show that thecorresponding optimization algorithms exhibit remarkable generalization acrossa broad spectrum of RL benchmarks, context shift scenarios, data qualities anddeep learning architectures. This work lays the information theoreticfoundation for COMRL methods, leading to a better understanding of taskrepresentation learning in the context of reinforcement learning.

Quick Read (beta)

loading the full paper ...