### Abstract

Reinforcement learning for multi-agent games has attracted lots of attentionrecently. However, given the challenge of solving Nash equilibria for largepopulation games, existing works with guaranteed polynomial complexities eitherfocus on variants of zero-sum and potential games, or aim at solving (coarse)correlated equilibria, or require access to simulators, or rely on certainassumptions that are hard to verify. This work proposes MF-OML (Mean-FieldOccupation-Measure Learning), an online mean-field reinforcement learningalgorithm for computing approximate Nash equilibria of large populationsequential symmetric games. MF-OML is the first fully polynomial multi-agentreinforcement learning algorithm for provably solving Nash equilibria (up tomean-field approximation gaps that vanish as the number of players $N$ goes toinfinity) beyond variants of zero-sum and potential games. When evaluated bythe cumulative deviation from Nash equilibria, the algorithm is shown toachieve a high probability regret bound of $\tilde{O}(M^{3/4}+N^{-1/2}M)$ forgames with the strong Lasry-Lions monotonicity condition, and a regret bound of$\tilde{O}(M^{11/12}+N^{- 1/6}M)$ for games with only the Lasry-Lionsmonotonicity condition, where $M$ is the total number of episodes and $N$ isthe number of agents of the game. As a byproduct, we also obtain the firsttractable globally convergent computational algorithm for computing approximateNash equilibria of monotone mean-field games.