Deep Reinforcement Learning and the Deadly Triad

  • 2018-12-06 16:36:20
  • Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil
  • 13

Abstract

We know from reinforcement learning theory that temporal difference learningcan fail in certain cases. Sutton and Barto (2018) identify a deadly triad offunction approximation, bootstrapping, and off-policy learning. When thesethree properties are combined, learning can diverge with the value estimatesbecoming unbounded. However, several algorithms successfully combine thesethree properties, which indicates that there is at least a partial gap in ourunderstanding. In this work, we investigate the impact of the deadly triad inpractice, in the context of a family of popular deep reinforcement learningmodels - deep Q-networks trained with experience replay - analysing how thecomponents of this system play a role in the emergence of the deadly triad, andin the agent's performance

 

Quick Read (beta)

loading the full paper ...