QuaRL: Quantization for Sustainable Reinforcement Learning

Abstract

Deep reinforcement learning has achieved significant milestones, however, thecomputational demands of reinforcement learning training and inference remainsubstantial. Quantization is an effective method to reduce the computationaloverheads of neural networks, though in the context of reinforcement learning,it is unknown whether quantization's computational benefits outweigh theaccuracy costs introduced by the corresponding quantization error. To quantifythis tradeoff we perform a broad study applying quantization to reinforcementlearning. We apply standard quantization techniques such as post-trainingquantization (PTQ) and quantization aware training (QAT) to a comprehensive setof reinforcement learning tasks (Atari, Gym), algorithms (A2C, DDPG, DQN, D4PG,PPO), and models (MLPs, CNNs) and show that policies may be quantized to 8-bitswithout degrading reward, enabling significant inference speedups onresource-constrained edge devices. Motivated by the effectiveness of standardquantization techniques on reinforcement learning policies, we introduce anovel quantization algorithm, \textit{ActorQ}, for quantized actor-learnerdistributed reinforcement learning training. By leveraging full precisionoptimization on the learner and quantized execution on the actors,\textit{ActorQ} enables 8-bit inference while maintaining convergence. Wedevelop a system for quantized reinforcement learning training around\textit{ActorQ} and demonstrate end to end speedups of $>$ 1.5 $\times$ - 2.5$\times$ over full precision training on a range of tasks (Deepmind ControlSuite). Finally, we break down the various runtime costs of distributedreinforcement learning training (such as communication time, inference time,model load time, etc) and evaluate the effects of quantization on these systemattributes.

Quick Read (beta)

loading the full paper ...