Quantifying Generalization in Reinforcement Learning

  • 2018-12-06 04:29:29
  • Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, John Schulman
  • 23

Abstract

In this paper, we investigate the problem of overfitting in deepreinforcement learning. Among the most common benchmarks in RL, it is customaryto use the same environments for both training and testing. This practiceoffers relatively little insight into an agent's ability to generalize. Weaddress this issue by using procedurally generated environments to constructdistinct training and test sets. Most notably, we introduce a new environmentcalled CoinRun, designed as a benchmark for generalization in RL. UsingCoinRun, we find that agents overfit to surprisingly large training sets. Wethen show that deeper convolutional architectures improve generalization, as domethods traditionally found in supervised learning, including L2regularization, dropout, data augmentation and batch normalization.

 

Quick Read (beta)

loading the full paper ...