SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes

Abstract

In multiagent environments, the capability of learning is important for anagent to behave appropriately in face of unknown opponents and dynamicenvironment. From the system designer's perspective, it is desirable if theagents can learn to coordinate towards socially optimal outcomes, while alsoavoiding being exploited by selfish opponents. To this end, we propose a novelgradient ascent based algorithm (SA-IGA) which augments the basicgradient-ascent algorithm by incorporating social awareness into the policyupdate process. We theoretically analyze the learning dynamics of SA-IGA usingdynamical system theory and SA-IGA is shown to have linear dynamics for a widerange of games including symmetric games. The learning dynamics of tworepresentative games (the prisoner's dilemma game and the coordination game)are analyzed in details. Based on the idea of SA-IGA, we further propose apractical multiagent learning algorithm, called SA-PGA, based on Q-learningupdate rule. Simulation results show that SA-PGA agent can achieve highersocial welfare than previous social-optimality oriented Conditional JointAction Learner (CJAL) and also is robust against individually rationalopponents by reaching Nash equilibrium solutions.

Quick Read (beta)

loading the full paper ...