The Formalism-Implementation Gap in Reinforcement Learning Research

Abstract

The last decade has seen an upswing in interest and adoption of reinforcementlearning (RL) techniques, in large part due to its demonstrated capabilities atperforming certain tasks at "super-human levels". This has incentivized thecommunity to prioritize research that demonstrates RL agent performance, oftenat the expense of research aimed at understanding their learning dynamics.Performance-focused research runs the risk of overfitting on academicbenchmarks -- thereby rendering them less useful -- which can make it difficultto transfer proposed techniques to novel problems. Further, it implicitlydiminishes work that does not push the performance-frontier, but aims atimproving our understanding of these techniques. This paper argues two points:(i) RL research should stop focusing solely on demonstrating agentcapabilities, and focus more on advancing the science and understanding ofreinforcement learning; and (ii) we need to be more precise on how ourbenchmarks map to the underlying mathematical formalisms. We use the popularArcade Learning Environment (ALE; Bellemare et al., 2013) as an example of abenchmark that, despite being increasingly considered "saturated", can beeffectively used for developing this understanding, and facilitating thedeployment of RL techniques in impactful real-world problems.

Quick Read (beta)

loading the full paper ...