Suphx: Mastering Mahjong with Deep Reinforcement Learning

Abstract

Artificial Intelligence (AI) has achieved great success in many domains, andgame AI is widely regarded as its beachhead since the dawn of AI. In recentyears, studies on game AI have gradually evolved from relatively simpleenvironments (e.g., perfect-information games such as Go, chess, shogi ortwo-player imperfect-information games such as heads-up Texas hold'em) to morecomplex ones (e.g., multi-player imperfect-information games such asmulti-player Texas hold'em and StartCraft II). Mahjong is a popularmulti-player imperfect-information game worldwide but very challenging for AIresearch due to its complex playing/scoring rules and rich hidden information.We design an AI for Mahjong, named Suphx, based on deep reinforcement learningwith some newly introduced techniques including global reward prediction,oracle guiding, and run-time policy adaptation. Suphx has demonstrated strongerperformance than most top human players in terms of stable rank and is ratedabove 99.99% of all the officially ranked human players in the Tenhou platform.This is the first time that a computer program outperforms most top humanplayers in Mahjong.

Quick Read (beta)

loading the full paper ...