Accelerated Methods for Deep Reinforcement Learning

Abstract

Deep reinforcement learning (RL) has achieved many recent successes, yetexperiment turn-around time remains a key bottleneck in research and inpractice. We investigate how to optimize existing deep RL algorithms for moderncomputers, specifically for a combination of CPUs and GPUs. We confirm thatboth policy gradient and Q-value learning algorithms can be adapted to learnusing many parallel simulator instances. We further find it possible to trainusing batch sizes considerably larger than are standard, without negativelyaffecting sample complexity or final performance. We leverage these facts tobuild a unified framework for parallelization that dramatically hastensexperiments in both classes of algorithm. All neural network computations useGPUs, accelerating both data collection and training. Our results include usingan entire DGX-1 to learn successful strategies in Atari games in mere minutes,using both synchronous and asynchronous algorithms.

Quick Read (beta)

loading the full paper ...