Podracer architectures for scalable Reinforcement Learning

Abstract

Supporting state-of-the-art AI research requires balancing rapid prototyping,ease of use, and quick iteration, with the ability to deploy experiments at ascale traditionally associated with production systems.Deep learning frameworkssuch as TensorFlow, PyTorch and JAX allow users to transparently make use ofaccelerators, such as TPUs and GPUs, to offload the more computationallyintensive parts of training and inference in modern deep learning systems.Popular training pipelines that use these frameworks for deep learningtypically focus on (un-)supervised learning. How to best train reinforcementlearning (RL) agents at scale is still an active research area. In this reportwe argue that TPUs are particularly well suited for training RL agents in ascalable, efficient and reproducible way. Specifically we describe twoarchitectures designed to make the best use of the resources available on a TPUPod (a special configuration in a Google data center that features multiple TPUdevices connected to each other by extremely low latency communicationchannels).

Quick Read (beta)

loading the full paper ...