Decoupling Representation Learning from Reinforcement Learning

Abstract

In an effort to overcome limitations of reward-driven feature learning indeep reinforcement learning (RL) from images, we propose decouplingrepresentation learning from policy learning. To this end, we introduce a newunsupervised learning (UL) task, called Augmented Temporal Contrast (ATC),which trains a convolutional encoder to associate pairs of observationsseparated by a short time difference, under image augmentations and using acontrastive loss. In online RL experiments, we show that training the encoderexclusively using ATC matches or outperforms end-to-end RL in mostenvironments. Additionally, we benchmark several leading UL algorithms bypre-training encoders on expert demonstrations and using them, with weightsfrozen, in RL agents; we find that agents using ATC-trained encoders outperformall others. We also train multi-task encoders on data from multipleenvironments and show generalization to different downstream RL tasks. Finally,we ablate components of ATC, and introduce a new data augmentation to enablereplay of (compressed) latent images from pre-trained encoders when RL requiresaugmentation. Our experiments span visually diverse RL benchmarks in DeepMindControl, DeepMind Lab, and Atari, and our complete code is available athttps://github.com/astooke/rlpyt/rlpyt/ul.

Quick Read (beta)

loading the full paper ...