The challenge of hidden gifts in multi-agent reinforcement learning

  • 2025-07-30 01:18:05
  • Dane Malenfant, Blake A. Richards
  • 0

Abstract

Sometimes we benefit from actions that others have taken even when we areunaware that they took those actions. For example, if your neighbor chooses notto take a parking spot in front of your house when you are not there, you canbenefit, even without being aware that they took this action. These "hiddengifts" represent an interesting challenge for multi-agent reinforcementlearning (MARL), since assigning credit when the beneficial actions of othersare hidden is non-trivial. Here, we study the impact of hidden gifts with avery simple MARL task. In this task, agents in a grid-world environment haveindividual doors to unlock in order to obtain individual rewards. As well, ifall the agents unlock their door the group receives a larger collective reward.However, there is only one key for all of the doors, such that the collectivereward can only be obtained when the agents drop the key for others after theyuse it. Notably, there is nothing to indicate to an agent that the other agentshave dropped the key, thus the act of dropping the key for others is a "hiddengift". We show that several different state-of-the-art RL algorithms, includingMARL algorithms, fail to learn how to obtain the collective reward in thissimple task. Interestingly, we find that independent model-free policy gradientagents can solve the task when we provide them with information about their ownaction history, but MARL agents still cannot solve the task with actionhistory. Finally, we derive a correction term for these independent agents,inspired by learning aware approaches, which reduces the variance in learningand helps them to converge to collective success more reliably. These resultsshow that credit assignment in multi-agent settings can be particularlychallenging in the presence of "hidden gifts", and demonstrate that learningawareness in independent agents can benefit these settings.

 

Quick Read (beta)

loading the full paper ...