Abstract
Few classical games have been regarded as such significant benchmarks ofartificial intelligence as to have justified training costs in the millions ofdollars. Among these, Stratego -- a board wargame exemplifying the challenge ofstrategic decision making under massive amounts of hidden information -- standsapart as a case where such efforts failed to produce performance at the levelof top humans. This work establishes a step change in both performance and costfor Stratego, showing that it is now possible not only to reach the level oftop humans, but to achieve vastly superhuman level -- and that doing sorequires not an industrial budget, but merely a few thousand dollars. Weachieved this result by developing general approaches for self-playreinforcement learning and test-time search under imperfect information.