A Study on Overfitting in Deep Reinforcement Learning

Abstract

Recent years have witnessed significant progresses in deep ReinforcementLearning (RL). Empowered with large scale neural networks, carefully designedarchitectures, novel training algorithms and massively parallel computingdevices, researchers are able to attack many challenging RL problems. However,in machine learning, more training power comes with a potential risk of moreoverfitting. As deep RL techniques are being applied to critical problems suchas healthcare and finance, it is important to understand the generalizationbehaviors of the trained agents. In this paper, we conduct a systematic studyof standard RL agents and find that they could overfit in various ways.Moreover, overfitting could happen ``robustly'': commonly used techniques in RLthat add stochasticity do not necessarily prevent or detect overfitting. Inparticular, the same agents and learning algorithms could have drasticallydifferent test performance, even when all of them achieve optimal rewardsduring training. The observations call for more principled and carefulevaluation protocols in RL. We conclude with a general discussion onoverfitting in RL and a study of the generalization behaviors from theperspective of inductive bias.

Quick Read (beta)

loading the full paper ...