Attention Privileged Reinforcement Learning For Domain Transfer

Abstract

Applying reinforcement learning (RL) to physical systems presents notablechallenges, given requirements regarding sample efficiency, safety, andphysical constraints compared to simulated environments. To enable transfer ofpolicies trained in simulation, randomising simulation parameters leads to morerobust policies, but also significantly extends training time. In this paper,we exploit access to privileged information (such as environment states) oftenavailable in simulation, in order to improve and accelerate learning overrandomised environments. We introduce Attention Privileged ReinforcementLearning (APRiL), which equips the agent with an attention mechanism and makesuse of state information in simulation, learning to align attention betweenstate- and image-based policies while additionally sharing generated data.During deployment we can apply the image-based policy to remove the requirementof access to additional information. We experimentally demonstrate acceleratedand more robust learning on a number of diverse domains, leading to improvedfinal performance for environments both within and outside the trainingdistribution.

Quick Read (beta)

loading the full paper ...