Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning

Abstract

The discovery of individual objectives in collective behavior of complexdynamical systems such as fish schools and bacteria colonies is a long-standingchallenge. Inverse reinforcement learning is a potent approach for addressingthis challenge but its applicability to dynamical systems, involving continuousstate-action spaces and multiple interacting agents, has been limited. In thisstudy, we tackle this challenge by introducing an off-policy inversemulti-agent reinforcement learning algorithm (IMARL). Our approach combines theReF-ER techniques with guided cost learning. By leveraging demonstrations, ouralgorithm automatically uncovers the reward function and learns an effectivepolicy for the agents. Through extensive experimentation, we demonstrate thatthe proposed policy captures the behavior observed in the provided data, andachieves promising results across problem domains including single agent modelsin the OpenAI gym and multi-agent models of schooling behavior. The presentstudy shows that the proposed IMARL algorithm is a significant step towardsunderstanding collective dynamics from the perspective of its constituents, andshowcases its value as a tool for studying complex physical systems exhibitingcollective behaviour.

Quick Read (beta)

loading the full paper ...