Abstract
Multi-agent robust reinforcement learning, also known as multi-player robustMarkov games (RMGs), is a crucial framework for modeling competitiveinteractions under environmental uncertainties, with wide applications inmulti-agent systems. However, existing results on sample complexity in RMGssuffer from at least one of three obstacles: restrictive range of uncertaintylevel or accuracy, the curse of multiple agents, and the barrier of longhorizons, all of which cause existing results to significantly exceed theinformation-theoretic lower bound. To close this gap, we extend the Q-FTRLalgorithm \citep{li2022minimax} to the RMGs in finite-horizon setting, assumingaccess to a generative model. We prove that the proposed algorithm achieves an$\varepsilon$-robust coarse correlated equilibrium (CCE) with a samplecomplexity (up to log factors) of$\widetilde{O}\left(H^3S\sum_{i=1}^mA_i\min\left\{H,1/R\right\}/\varepsilon^2\right)$,where $S$ denotes the number of states, $A_i$ is the number of actions of the$i$-th agent, $H$ is the finite horizon length, and $R$ is uncertainty level.We also show that this sample compelxity is minimax optimal by combining aninformation-theoretic lower bound. Additionally, in the special case oftwo-player zero-sum RMGs, the algorithm achieves an $\varepsilon$-robust Nashequilibrium (NE) with the same sample complexity.