Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari

Abstract

Reinforcement learning (RL) has seen great advancements in the past fewyears. Nevertheless, the consensus among the RL community is that currentlyused methods, despite all their benefits, suffer from extreme datainefficiency, especially in the rich visual domains like Atari. To circumventthis problem, novel approaches were introduced that often claim to be much moreefficient than popular variations of the state-of-the-art DQN algorithm. Inthis paper, however, we demonstrate that the newly proposed techniques simplyused unfair baselines in their experiments. Namely, we show that the actualimprovement in the efficiency came from allowing the algorithm for moretraining updates for each data sample, and not from employing the new methods.By allowing DQN to execute network updates more frequently we manage to reachsimilar or better results than the recently proposed advancement, often at afraction of complexity and computational costs. Furthermore, based on theoutcomes of the study, we argue that the agent similar to the modified DQN thatis presented in this paper should be used as a baseline for any future workaimed at improving sample efficiency of deep reinforcement learning.

Quick Read (beta)

loading the full paper ...