A Scalable Finite Difference Method for Deep Reinforcement Learning

Abstract

Several low-bandwidth distributable black-box optimization algorithms in thefamily of finite differences such as Evolution Strategies have recently beenshown to perform nearly as well as tailored Reinforcement Learning methods insome Reinforcement Learning domains. One shortcoming of these black-box methodsis that they must collect information about the structure of the returnfunction at every update, and can often employ only information drawn from adistribution centered around the current parameters. As a result, when thesealgorithms are distributed across many machines, a significant portion of totalruntime may be spent with many machines idle, waiting for a final return andthen for an update to be calculated. In this work we introduce a novel methodto use older data in finite difference algorithms, which produces a scalablealgorithm that avoids significant idle time or wasted computation.

Quick Read (beta)

loading the full paper ...