Assessing the Potential of Classical Q-learning in General Game Playing

  • 2018-10-14 18:49:33
  • Hui Wang, Michael Emmerich, Aske Plaat
  • 3

Abstract

After the recent groundbreaking results of AlphaGo and AlphaZero, we haveseen strong interests in deep reinforcement learning and artificial generalintelligence (AGI) in game playing. However, deep learning isresource-intensive and the theory is not yet well developed. For small games,simple classical table-based Q-learning might still be the algorithm of choice.General Game Playing (GGP) provides a good testbed for reinforcement learningto research AGI. Q-learning is one of the canonical reinforcement learningmethods, and has been used by (Banerjee $\&$ Stone, IJCAI 2007) in GGP. In thispaper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe,Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, toallow comparison to Banerjee et al.. We find that Q-learning converges to ahigh win rate in GGP. For the $\epsilon$-greedy strategy, we propose a firstenhancement, the dynamic $\epsilon$ algorithm. In addition, inspired by (Gelly$\&$ Silver, ICML 2007) we combine online search (Monte Carlo Search) toenhance offline learning, and propose QM-learning for GGP. Both enhancementsimprove the performance of classical Q-learning. In this work, GGP allows us toshow, if augmented by appropriate enhancements, that classical table-basedQ-learning can perform well in small games.

 

Quick Read (beta)

loading the full paper ...