Accelerating Deep Neuroevolution on Distributed FPGAs for Reinforcement Learning Problems

Abstract

Reinforcement learning augmented by the representational power of deep neuralnetworks, has shown promising results on high-dimensional problems, such asgame playing and robotic control. However, the sequential nature of theseproblems poses a fundamental challenge for computational efficiency. Recently,alternative approaches such as evolutionary strategies and deep neuroevolutiondemonstrated competitive results with faster training time on distributed CPUcores. Here, we report record training times (running at about 1 million framesper second) for Atari 2600 games using deep neuroevolution implemented ondistributed FPGAs. Combined hardware implementation of the game console, imagepre-processing and the neural network in an optimized pipeline, multiplied withthe system level parallelism enabled the acceleration. These results are thefirst application demonstration on the IBM Neural Computer, which is a customdesigned system that consists of 432 Xilinx FPGAs interconnected in a 3D meshnetwork topology. In addition to high performance, experiments also showedimprovement in accuracy for all games compared to the CPU-implementation of thesame algorithm.

Quick Read (beta)

loading the full paper ...