MinAtar: An Atari-inspired Testbed for More Efficient Reinforcement Learning Experiments

  • 2019-03-07 20:34:36
  • Kenny Young, Tian Tian
  • 16


The Arcade Learning Environment (ALE) is a popular platform for evaluatingreinforcement learning agents. Much of the appeal comes from the fact thatAtari games are varied, showcase aspects of competency we expect from anintelligent agent, and are not biased towards any particular solution approach.The challenge of the ALE includes 1) the representation learning problem ofextracting pertinent information from the raw pixels, and 2) the behaviourallearning problem of leveraging complex, delayed associations between actionsand rewards. Often, in reinforcement learning research, we care more about thelatter, but the representation learning problem adds significant computationalexpense. In response, we introduce MinAtar, short for miniature Atari, a newevaluation platform that captures the general mechanics of specific Atarigames, while simplifying certain aspects. In particular, we reduce therepresentational complexity to focus more on behavioural challenges. MinAtarconsists of analogues to five Atari games which play out on a 10x10 grid.MinAtar provides a 10x10xn state representation. The n channels correspond togame-specific objects, such as ball, paddle and brick in the game Breakout.While significantly simplified, these domains are still rich enough to allowfor interesting behaviours. To demonstrate the challenges posed by thesedomains, we evaluated a smaller version of the DQN architecture. We also triedvariants of DQN without experience replay, and without a target network, toassess the impact of those two prominent components in the MinAtarenvironments. In addition, we evaluated a simpler agent that used actor-criticwith eligibility traces, online updating, and no experience replay. We hopethat by introducing a set of simplified, Atari-like games we can allowresearchers to more efficiently investigate the unique behavioural challengesprovided by the ALE.


Introduction (beta)



Conclusion (beta)