PPO Dash: Improving Generalization in Deep Reinforcement Learning

Abstract

Deep reinforcement learning is prone to overfitting, and traditionalbenchmarks such as Atari 2600 benchmark can exacerbate this problem. TheObstacle Tower Challenge addresses this by using randomized environments andseparate seeds for training, validation, and test runs. This paper examinesvarious improvements and best practices to the PPO algorithm using the ObstacleTower Challenge to empirically study their impact with regards togeneralization. Our experiments show that the combination providesstate-of-the-art performance on the Obstacle Tower Challenge.

Quick Read (beta)

loading the full paper ...