Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

Abstract

Deep reinforcement learning (RL) algorithms can use high-capacity deepnetworks to learn directly from image observations. However, these kinds ofobservation spaces present a number of challenges in practice, since the policymust now solve two problems: a representation learning problem, and a tasklearning problem. In this paper, we aim to explicitly learn representationsthat can accelerate reinforcement learning from images. We propose thestochastic latent actor-critic (SLAC) algorithm: a sample-efficient andhigh-performing RL algorithm for learning policies for complex continuouscontrol tasks directly from high-dimensional image inputs. SLAC learns acompact latent representation space using a stochastic sequential latentvariable model, and then learns a critic model within this latent space. Bylearning a critic within a compact state space, SLAC can learn much moreefficiently than standard RL methods. The proposed model improves performancesubstantially over alternative representations as well, such as variationalautoencoders. In fact, our experimental evaluation demonstrates that the sampleefficiency of our resulting method is comparable to that of model-based RLmethods that directly use a similar type of model for control. Furthermore, ourmethod outperforms both model-free and model-based alternatives in terms offinal performance and sample efficiency, on a range of difficult image-basedcontrol tasks. Our code and videos of our results are available at our website.

Quick Read (beta)

loading the full paper ...