Abstract
Deep reinforcement learning methods for multi-agent systems make optimaldecisions dependent on states observed by agents, but a little uncertainty onthe observations can possibly mislead agents into taking wrong actions. Themean-field actor-critic reinforcement learning (MFAC) is very famous in themulti-agent field since it can effectively handle the scalability problem.However, this paper finds that it is also sensitive to state perturbationswhich can significantly degrade the team rewards. This paper proposes a robustlearning framework for MFAC called RoMFAC that has two innovations: 1) a newobjective function of training actors, composed of a \emph{policy gradientfunction} that is related to the expected cumulative discount reward on sampledclean states and an \emph{action loss function} that represents the differencebetween actions taken on clean and adversarial states; and 2) a repetitiveregularization of the action loss that ensures the trained actors obtain a goodperformance. Furthermore, we prove that the proposed action loss function isconvergent. Experiments show that RoMFAC is robust against adversarialperturbations while maintaining its good performance in environments withoutperturbations.