RoMFAC: A Robust Mean-Field Actor-Critic Reinforcement Learning against Adversarial Perturbations on States

Abstract

Deep reinforcement learning methods for multi-agent systems make optimaldecisions dependent on states observed by agents, but a little uncertainty onthe observations can possibly mislead agents into taking wrong actions. Themean-field actor-critic reinforcement learning (MFAC) is very famous in themulti-agent field since it can effectively handle the scalability problem.However, this paper finds that it is also sensitive to state perturbationswhich can significantly degrade the team rewards. This paper proposes a robustlearning framework for MFAC called RoMFAC that has two innovations: 1) a newobjective function of training actors, composed of a \emph{policy gradientfunction} that is related to the expected cumulative discount reward on sampledclean states and an \emph{action loss function} that represents the differencebetween actions taken on clean and adversarial states; and 2) a repetitiveregularization of the action loss that ensures the trained actors obtain a goodperformance. Furthermore, we prove that the proposed action loss function isconvergent. Experiments show that RoMFAC is robust against adversarialperturbations while maintaining its good performance in environments withoutperturbations.

Quick Read (beta)

loading the full paper ...