Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Abstract

We study how the behavior of deep policy gradient algorithms reflects theconceptual framework motivating their development. We propose a fine-grainedanalysis of state-of-the-art methods based on key aspects of this framework:gradient estimation, value prediction, optimization landscapes, and trustregion enforcement. We find that from this perspective, the behavior of deeppolicy gradient algorithms often deviates from what their motivating frameworkwould predict. Our analysis suggests first steps towards solidifying thefoundations of these algorithms, and in particular indicates that we may needto move beyond the current benchmark-centric evaluation methodology.

Quick Read (beta)

loading the full paper ...