Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

Abstract

Modern reinforcement learning algorithms reach super-human performance onmany board and video games, but they are sample inefficient, i.e. theytypically require significantly more playing experience than humans to reach anequal performance level. To improve sample efficiency, an agent may build amodel of the environment and use planning methods to update its policy. In thisarticle we introduce Variational State Tabulation (VaST), which maps anenvironment with a high-dimensional state space (e.g. the space of visualinputs) to an abstract tabular model. Prioritized sweeping with small backups,a highly efficient planning method, can then be used to update state-actionvalues. We show how VaST can rapidly learn to maximize reward in tasks like 3Dnavigation and efficiently adapt to sudden changes in rewards or transitionprobabilities.

Quick Read (beta)

loading the full paper ...