Abstract
The discovery of individual objectives in collective behavior of complexdynamical systems such as fish schools and bacteria colonies is a long-standingchallenge. Inverse reinforcement learning is a potent approach for addressingthis challenge but its applicability to dynamical systems, involving continuousstate-action spaces and multiple interacting agents, has been limited. In thisstudy, we tackle this challenge by introducing an off-policy inversemulti-agent reinforcement learning algorithm (IMARL). Our approach combines theReF-ER techniques with guided cost learning. By leveraging demonstrations, ouralgorithm automatically uncovers the reward function and learns an effectivepolicy for the agents. Through extensive experimentation, we demonstrate thatthe proposed policy captures the behavior observed in the provided data, andachieves promising results across problem domains including single agent modelsin the OpenAI gym and multi-agent models of schooling behavior. The presentstudy shows that the proposed IMARL algorithm is a significant step towardsunderstanding collective dynamics from the perspective of its constituents, andshowcases its value as a tool for studying complex physical systems exhibitingcollective behaviour.