Prosocial learning agents solve generalized Stag Hunts better than selfish ones

Abstract

Deep reinforcement learning has become an important paradigm for constructingagents that can enter complex multi-agent situations and improve their policiesthrough experience. One commonly used technique is reactive training - applyingstandard RL methods while treating other agents as a part of the learner'senvironment. It is known that in general-sum games reactive training can leadgroups of agents to converge to inefficient outcomes. We focus on one suchclass of environments: Stag Hunt games. Here agents either choose a riskycooperative policy (which leads to high payoffs if both choose it but lowpayoffs to an agent who attempts it alone) or a safe one (which leads to a safepayoff no matter what). We ask how we can change the learning rule of a singleagent to improve its outcomes in Stag Hunts that include other reactivelearners. We extend existing work on reward-shaping in multi-agentreinforcement learning and show that that making a single agent prosocial, thatis, making them care about the rewards of their partners can increase theprobability that groups converge to good outcomes. Thus, even if we control asingle agent in a group making that agent prosocial can increase our agent'slong-run payoff. We show experimentally that this result carries over to avariety of more complex environments with Stag Hunt-like dynamics includingones where agents must learn from raw input pixels.

Quick Read (beta)

loading the full paper ...