Model-Based Stabilisation of Deep Reinforcement Learning

Abstract

Though successful in high-dimensional domains, deep reinforcement learningexhibits high sample complexity and suffers from stability issues as reportedby researchers and practitioners in the field. These problems hinder theapplication of such algorithms in real-world and safety-critical scenarios. Inthis paper, we take steps towards stable and efficient reinforcement learningby following a model-based approach that is known to reduce agent-environmentinteractions. Namely, our method augments deep Q-networks (DQNs) with modelpredictions for transitions, rewards, and termination flags. Having the model at hand, we then conduct a rigorous theoretical study of ouralgorithm and show, for the first time, convergence to a stationary point. Enroute, we provide a counter-example showing that 'vanilla' DQNs can divergeconfirming practitioners' and researchers' experiences. Our proof is novel inits own right and can be extended to other forms of deep reinforcementlearning. In particular, we believe exploiting the relation betweenreinforcement (with deep function approximators) and online learning can serveas a recipe for future proofs in the domain. Finally, we validate ourtheoretical results in 20 games from the Atari benchmark. Our results show thatfollowing the proposed model-based learning approach not only ensuresconvergence but leads to a reduction in sample complexity and superiorperformance.

Quick Read (beta)

loading the full paper ...