Attention-Privileged Reinforcement Learning

Abstract

Image-based Reinforcement Learning is known to suffer from poor sampleefficiency and generalisation to unseen visuals such as distractors(task-independent aspects of the observation space). Visual domainrandomisation encourages transfer by training over visual factors of variationthat may be encountered in the target domain. This increases learningcomplexity, can negatively impact learning rate and performance, and requiresknowledge of potential variations during deployment. In this paper, weintroduce Attention-Privileged Reinforcement Learning (APRiL) which uses aself-supervised attention mechanism to significantly alleviate these drawbacks:by focusing on task-relevant aspects of the observations, attention providesrobustness to distractors as well as significantly increased learningefficiency. APRiL trains two attention-augmented actor-critic agents: onepurely based on image observations, available across training and transferdomains; and one with access to privileged information (such as environmentstates) available only during training. Experience is shared between bothagents and their attention mechanisms are aligned. The image-based policy canthen be deployed without access to privileged information. We experimentallydemonstrate accelerated and more robust learning on a diverse set of domains,leading to improved final performance for environments both within and outsidethe training distribution.

Quick Read (beta)

loading the full paper ...