Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning

Abstract

Meta-Reinforcement Learning (Meta-RL) agents can struggle to operate acrosstasks with varying environmental features that require different optimal skills(i.e., different modes of behaviours). Using context encoders based oncontrastive learning to enhance the generalisability of Meta-RL agents is nowwidely studied but faces challenges such as the requirement for a large samplesize, also referred to as the $\log$-$K$ curse. To improve RL generalisation todifferent tasks, we first introduce Skill-aware Mutual Information (SaMI), anoptimisation objective that aids in distinguishing context embeddings accordingto skills, thereby equipping RL agents with the ability to identify and executedifferent skills across tasks. We then propose Skill-aware Noise ContrastiveEstimation (SaNCE), a $K$-sample estimator used to optimise the SaMI objective.We provide a framework for equipping an RL agent with SaNCE in practice andconduct experimental validation on modified MuJoCo and Panda-gym benchmarks. Weempirically find that RL agents that learn by maximising SaMI achievesubstantially improved zero-shot generalisation to unseen tasks. Additionally,the context encoder equipped with SaNCE demonstrates greater robustness toreductions in the number of available samples, thus possessing the potential toovercome the $\log$-$K$ curse.

Quick Read (beta)

loading the full paper ...