Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games

Abstract

In multi-agent reinforcement learning (MARL), independent learners are thosethat do not observe the actions of other agents in the system. Due to thedecentralization of information, it is challenging to design independentlearners that drive play to equilibrium. This paper investigates thefeasibility of using satisficing dynamics to guide independent learners toapproximate equilibrium in stochastic games. For $\epsilon \geq 0$, an$\epsilon$-satisficing policy update rule is any rule that instructs the agentto not change its policy when it is $\epsilon$-best-responding to the policiesof the remaining players; $\epsilon$-satisficing paths are defined to besequences of joint policies obtained when each agent uses some$\epsilon$-satisficing policy update rule to select its next policy. Weestablish structural results on the existence of $\epsilon$-satisficing pathsinto $\epsilon$-equilibrium in both symmetric $N$-player games and generalstochastic games with two players. We then present an independent learningalgorithm for $N$-player symmetric games and give high probability guaranteesof convergence to $\epsilon$-equilibrium under self-play. This guarantee ismade using symmetry alone, leveraging the previously unexploited structure of$\epsilon$-satisficing paths.

Quick Read (beta)

loading the full paper ...