Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

  • 2018-11-06 18:54:21
  • Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry
We study how the behavior of deep policy gradient algorithms reflects theconceptual framework motivating their development. We propose a fine-grainedanalysis of state-of-the-art methods based on key aspects of this framework:gradient estimation, value prediction, optimization landscapes, and trustregion enforcement. We find that from this perspective, the behavior of deeppolicy gradient algorithms often deviates from what their motivating frameworkwould predict. Our analysis suggests first steps towards solidifying thefoundations of these algorithms, and in particular indicates that we may needto move beyond the current benchmark-centric evaluation methodology.


