TimeRL: Efficient Deep Reinforcement Learning with Polyhedral Dependence Graphs

Abstract

Modern deep learning (DL) workloads increasingly use complex deepreinforcement learning (DRL) algorithms that generate training data within thelearning loop. This results in programs with several nested loops and dynamicdata dependencies between tensors. While DL systems with eager executionsupport such dynamism, they lack the optimizations and smart scheduling ofgraph-based execution. Graph-based execution, however, cannot express dynamictensor shapes, instead requiring the use of multiple static subgraphs. Eitherexecution model for DRL thus leads to redundant computation, reducedparallelism, and less efficient memory management. We describe TimeRL, a system for executing dynamic DRL programs that combinesthe dynamism of eager execution with the whole-program optimizations andscheduling of graph-based execution. TimeRL achieves this by introducing thedeclarative programming model of recurrent tensors, which allows users todefine dynamic dependencies as intuitive recurrence equations. TimeRLtranslates recurrent tensors into a polyhedral dependence graph (PDG) withdynamic dependencies as symbolic expressions. Through simple PDGtransformations, TimeRL applies whole-program optimizations, such as automaticvectorization, incrementalization, and operator fusion. The PDG also allows forthe computation of an efficient program-wide execution schedule, which decideson buffer deallocations, buffer donations, and GPU/CPU memory swapping. We showthat TimeRL executes current DRL algorithms up to 47$\times$ faster thanexisting DRL systems, while using 16$\times$ less GPU peak memory.

Quick Read (beta)

loading the full paper ...