Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

  • 2021-02-22 19:29:18
  • Brett Daley, Cameron Hickert, Christopher Amato
  • 0

Abstract

Deep Reinforcement Learning (RL) methods rely on experience replay toapproximate the minibatched supervised learning setting; however, unlikesupervised learning where access to lots of training data is crucial togeneralization, replay-based deep RL appears to struggle in the presence ofextraneous data. Recent works have shown that the performance of Deep Q-Network(DQN) degrades when its replay memory becomes too large. This suggests that outdated experiences somehow impact the performance ofdeep RL, which should not be the case for off-policy methods like DQN.Consequently, we re-examine the motivation for sampling uniformly over a replaymemory, and find that it may be flawed when using function approximation. Weshow that -- despite conventional wisdom -- sampling from the uniformdistribution does not yield uncorrelated training samples and therefore biasesgradients during training. Our theory prescribes a special non-uniformdistribution to cancel this effect, and we propose a stratified sampling schemeto efficiently implement it.

 

Quick Read (beta)

loading the full paper ...