Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Abstract

Competitive Pok\'emon Singles (CPS) is a popular strategy game where playerslearn to exploit their opponent based on imperfect information in battles thatcan last more than one hundred stochastic turns. AI research in CPS has beenled by heuristic tree search and online self-play, but the game may also createa platform to study adaptive policies trained offline on large datasets. Wedevelop a pipeline to reconstruct the first-person perspective of an agent fromlogs saved from the third-person perspective of a spectator, thereby unlockinga dataset of real human battles spanning more than a decade that grows largerevery day. This dataset enables a black-box approach where we train largesequence models to adapt to their opponent based solely on their inputtrajectory while selecting moves without explicit search of any kind. We studya progression from imitation learning to offline RL and offline fine-tuning onself-play data in the hardcore competitive setting of Pok\'emon's four oldest(and most partially observed) game generations. The resulting agents outperforma recent LLM Agent approach and a strong heuristic search engine. While playinganonymously in online battles against humans, our best agents climb to rankingsinside the top 10% of active players. All agent checkpoints, training details,datasets, and baselines are available at https://metamon.tech.

Quick Read (beta)

loading the full paper ...