Generative Temporal Models with Spatial Memory for Partially Observed Environments

Abstract

In model-based reinforcement learning, generative and temporal models ofenvironments can be leveraged to boost agent performance, either by tuning theagent's representations during training or via use as part of an explicitplanning mechanism. However, their application in practice has been limited tosimplistic environments, due to the difficulty of training such models inlarger, potentially partially-observed and 3D environments. In this work weintroduce a novel action-conditioned generative model of such challengingenvironments. The model features a non-parametric spatial memory system inwhich we store learned, disentangled representations of the environment.Low-dimensional spatial updates are computed using a state-space model thatmakes use of knowledge on the prior dynamics of the moving agent, andhigh-dimensional visual observations are modelled with a VariationalAuto-Encoder. The result is a scalable architecture capable of performingcoherent predictions over hundreds of time steps across a range of partiallyobserved 2D and 3D environments.

Quick Read (beta)

loading the full paper ...