Abstract
The combination of deep reinforcement learning and search at both trainingand test time is a powerful paradigm that has led to a number of a successes insingle-agent settings and perfect-information games, best exemplified by thesuccess of AlphaZero. However, algorithms of this form have been unable to copewith imperfect-information games. This paper presents ReBeL, a generalframework for self-play reinforcement learning and search forimperfect-information games. In the simpler setting of perfect-informationgames, ReBeL reduces to an algorithm similar to AlphaZero. Results show ReBeLleads to low exploitability in benchmark imperfect-information games andachieves superhuman performance in heads-up no-limit Texas hold'em poker, whileusing far less domain knowledge than any prior poker AI. We also prove thatReBeL converges to a Nash equilibrium in two-player zero-sum games in tabularsettings.