ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation

Abstract

In a multirobot system, a number of cyber-physical attacks (e.g.,communication hijack, observation perturbations) can challenge the robustnessof agents. This robustness issue worsens in multiagent reinforcement learningbecause there exists the non-stationarity of the environment caused bysimultaneously learning agents whose changing policies affect the transitionand reward functions. In this paper, we propose a minimax MARL approach toinfer the worst-case policy update of other agents. As the minimax formulationis computationally intractable to solve, we apply the convex relaxation ofneural networks to solve the inner minimization problem. Such convex relaxationenables robustness in interacting with peer agents that may have significantlydifferent behaviors and also achieves a certified bound of the originaloptimization problem. We evaluate our approach on multiple mixedcooperative-competitive tasks and show that our method outperforms the previousstate of the art approaches on this topic.

Quick Read (beta)

loading the full paper ...