Abstract
Deep reinforcement learning continues to show tremendous potential inachieving task-level autonomy, however, its computational and energy demandsremain prohibitively high. In this paper, we tackle this problem by applyingquantization to reinforcement learning. To that end, we introduce a novelReinforcement Learning (RL) training paradigm, \textit{ActorQ}, to speed upactor-learner distributed RL training. \textit{ActorQ} leverages 8-bitquantized actors to speed up data collection without affecting learningconvergence. Our quantized distributed RL training system, \textit{ActorQ},demonstrates end-to-end speedups \blue{between 1.5 $\times$ and 5.41$\times$},and faster convergence over full precision training on a range of tasks(Deepmind Control Suite) and different RL algorithms (D4PG, DQN). Furthermore,we compare the carbon emissions (Kgs of CO2) of \textit{ActorQ} versus standardreinforcement learning \blue{algorithms} on various tasks. Across varioussettings, we show that \textit{ActorQ} enables more environmentally friendlyreinforcement learning by achieving \blue{carbon emission improvements between1.9$\times$ and 3.76$\times$} compared to training RL-agents in full-precision.We believe that this is the first of many future works on enablingcomputationally energy-efficient and sustainable reinforcement learning. Thesource code is available here for the public to use:\url{https://github.com/harvard-edge/QuaRL}.