Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

Abstract

Modern reinforcement learning algorithms reach super-human performance inmany board and video games, but they are sample inefficient, i.e. theytypically require significantly more playing experience than humans to reach anequal performance level. To improve sample efficiency, an agent may build amodel of the environment and use planning methods to update its policy. In thisarticle we introduce VaST (Variational State Tabulation), which maps anenvironment with a high-dimensional state space (e.g. the space of visualinputs) to an abstract tabular environment. Prioritized sweeping with smallbackups, a highly efficient planning method, can then be used to updatestate-action values. We show how VaST can rapidly learn to maximize reward intasks like 3D navigation and efficiently adapt to sudden changes in rewards ortransition probabilities.

Quick Read (beta)

loading the full paper ...