Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning

Abstract

Deep reinforcement learning (RL) agents that exist in high-dimensional statespaces, such as those composed of images, have interconnected learning burdens.Agents must learn an action-selection policy that completes their given task,which requires them to learn a representation of the state space that discernsbetween useful and useless information. The reward function is the onlysupervised feedback that RL agents receive, which causes a representationlearning bottleneck that can manifest in poor sample efficiency. We present$k$-Step Latent (KSL), a new representation learning method that enforcestemporal consistency of representations via a self-supervised auxiliary taskwherein agents learn to recurrently predict action-conditioned representationsof the state space. The state encoder learned by KSL produces low-dimensionalrepresentations that make optimization of the RL task more sample efficient.Altogether, KSL produces state-of-the-art results in both data efficiency andasymptotic performance in the popular PlaNet benchmark suite. Our analyses showthat KSL produces encoders that generalize better to new tasks unseen duringtraining, and its representations are more strongly tied to reward, are moreinvariant to perturbations in the state space, and move more smoothly throughthe temporal axis of the RL problem than other methods such as DrQ, RAD, CURL,and SAC-AE.

Quick Read (beta)

loading the full paper ...