Abstract
This work identifies a common flaw of deep reinforcement learning (RL)algorithms: a tendency to rely on early interactions and ignore useful evidenceencountered later. Because of training on progressively growing datasets, deepRL agents incur a risk of overfitting to earlier experiences, negativelyaffecting the rest of the learning process. Inspired by cognitive science, werefer to this effect as the primacy bias. Through a series of experiments, wedissect the algorithmic aspects of deep RL that exacerbate this bias. We thenpropose a simple yet generally-applicable mechanism that tackles the primacybias by periodically resetting a part of the agent. We apply this mechanism toalgorithms in both discrete (Atari 100k) and continuous action (DeepMindControl Suite) domains, consistently improving their performance.