Posterior Sampling for Deep Reinforcement Learning

Abstract

Despite remarkable successes, deep reinforcement learning algorithms remainsample inefficient: they require an enormous amount of trial and error to findgood policies. Model-based algorithms promise sample efficiency by building anenvironment model that can be used for planning. Posterior Sampling forReinforcement Learning is such a model-based algorithm that has attractedsignificant interest due to its performance in the tabular setting. This paperintroduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), thefirst truly scalable approximation of Posterior Sampling for ReinforcementLearning that retains its model-based essence. PSDRL combines efficientuncertainty quantification over latent state space models with a speciallytailored continual planning algorithm based on value-function approximation.Extensive experiments on the Atari benchmark show that PSDRL significantlyoutperforms previous state-of-the-art attempts at scaling up posterior samplingwhile being competitive with a state-of-the-art (model-based) reinforcementlearning method, both in sample efficiency and computational efficiency.

Quick Read (beta)

loading the full paper ...