Accelerating Distributed Deep Reinforcement Learning by In-Network Experience Sampling

Abstract

A computing cluster that interconnects multiple compute nodes is used toaccelerate distributed reinforcement learning based on DQN (Deep Q-Network). Indistributed reinforcement learning, Actor nodes acquire experiences byinteracting with a given environment and a Learner node optimizes their DQNmodel. Since data transfer between Actor and Learner nodes increases dependingon the number of Actor nodes and their experience size, communication overheadbetween them is one of major performance bottlenecks. In this paper, theircommunication is accelerated by DPDK-based network optimizations, andDPDK-based low-latency experience replay memory server is deployed betweenActor and Learner nodes interconnected with a 40GbE (40Gbit Ethernet) network.Evaluation results show that, as a network optimization technique, kernelbypassing by DPDK reduces network access latencies to a shared memory server by32.7% to 58.9%. As another network optimization technique, an in-networkexperience replay memory server between Actor and Learner nodes reduces accesslatencies to the experience replay memory by 11.7% to 28.1% and communicationlatencies for prioritized experience sampling by 21.9% to 29.1%.

Quick Read (beta)

loading the full paper ...