Abstract
Deep Reinforcement Learning (RL) has been demonstrated to yield capableagents and control policies in several domains but is commonly plagued byprohibitively long training times. Additionally, in the case of continuouscontrol problems, the applicability of learned policies on real-world embeddeddevices is limited due to the lack of real-time guarantees and portability ofexisting deep learning libraries. To address these challenges, we presentRLtools, a dependency-free, header-only, pure C++ library for deep supervisedand reinforcement learning. Leveraging the template meta-programmingcapabilities of recent C++ standards, we provide composable components that canbe tightly integrated by the compiler. Its novel architecture allows RLtools tobe used seamlessly on a heterogeneous set of platforms, from HPC clusters overworkstations and laptops to smartphones, smartwatches, and microcontrollers.Specifically, due to the tight integration of the RL algorithms with simulationenvironments, RLtools can solve popular RL problems like the Pendulum-v1swing-up about 7 to 15 times faster in terms of wall-clock training timecompared to other popular RL frameworks when using TD3. We also provide alow-overhead and parallelized interface to the MuJoCo simulator, showing thatour PPO implementation achieves state of the art returns in the Ant-v4environment while being 25%-30% faster in terms of wall-clock training time.Finally, we also benchmark the policy inference on a diverse set ofmicrocontrollers and show that in most cases our optimized inferenceimplementation is much faster than even the manufacturer's DSP libraries. Tothe best of our knowledge, RLtools enables the first-ever demonstration oftraining a deep RL algorithm directly on a microcontroller, giving rise to thefield of TinyRL. The source code is available through our project page athttps://rl.tools.