Assessing the Potential of Classical Q-learning in General Game Playing

Abstract

After the recent groundbreaking results of AlphaGo and AlphaZero, we haveseen strong interests in deep reinforcement learning and artificial generalintelligence (AGI) in game playing. However, deep learning isresource-intensive and the theory is not yet well developed. For small games,simple classical table-based Q-learning might still be the algorithm of choice.General Game Playing (GGP) provides a good testbed for reinforcement learningto research AGI. Q-learning is one of the canonical reinforcement learningmethods, and has been used by (Banerjee $\&$ Stone, IJCAI 2007) in GGP. In thispaper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe,Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, toallow comparison to Banerjee et al.. We find that Q-learning converges to ahigh win rate in GGP. For the $\epsilon$-greedy strategy, we propose a firstenhancement, the dynamic $\epsilon$ algorithm. In addition, inspired by (Gelly$\&$ Silver, ICML 2007) we combine online search (Monte Carlo Search) toenhance offline learning, and propose QM-learning for GGP. Both enhancementsimprove the performance of classical Q-learning. In this work, GGP allows us toshow, if augmented by appropriate enhancements, that classical table-basedQ-learning can perform well in small games.

Quick Read (beta)

loading the full paper ...