Large Batch Experience Replay

Abstract

Several algorithms have been proposed to sample non-uniformly the replaybuffer of deep Reinforcement Learning (RL) agents to speed-up learning, butvery few theoretical foundations of these sampling schemes have been provided.Among others, Prioritized Experience Replay appears as a hyperparametersensitive heuristic, even though it can provide good performance. In this work,we cast the replay buffer sampling problem as an importance sampling one forestimating the gradient. This allows deriving the theoretically optimalsampling distribution, yielding the best theoretical convergence speed.Elaborating on the knowledge of the ideal sampling scheme, we exhibit newtheoretical foundations of Prioritized Experience Replay. The optimal samplingdistribution being intractable, we make several approximations providing goodresults in practice and introduce, among others, LaBER (Large Batch ExperienceReplay), an easy-to-code and efficient method for sampling the replay buffer.LaBER, which can be combined with Deep Q-Networks, distributional RL agents oractor-critic methods, yields improved performance over a diverse range of Atarigames and PyBullet environments, compared to the base agent it is implementedon and to other prioritization schemes.

Quick Read (beta)

loading the full paper ...