Efficient Exploration through Bayesian Deep Q-Networks

Abstract

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson samplingbased Reinforcement Learning (RL) Algorithm. Thompson sampling allows fortargeted exploration in high dimensions through posterior sampling but isusually computationally expensive. We address this limitation by introducinguncertainty only at the output layer of the network through a Bayesian LinearRegression (BLR) model. This layer can be trained with fast closed-form updatesand its samples can be drawn efficiently through the Gaussian distribution. Weapply our method to a wide range of Atari games in Arcade LearningEnvironments. Since BDQN carries out more efficient exploration, it is able toreach higher rewards substantially faster than a key baseline, the double deepQ network (DDQN).

Quick Read (beta)

loading the full paper ...