Abstract
Multi-agent credit assignment is a fundamental challenge for cooperativemulti-agent reinforcement learning (MARL), where a team of agents learn fromshared reward signals. The Individual-Global-Max (IGM) condition is a widelyused principle for multi-agent credit assignment, requiring that the jointaction determined by individual Q-functions maximizes the global Q-value.Meanwhile, the principle of maximum entropy has been leveraged to enhanceexploration in MARL. However, we identify a critical limitation in existingmaximum entropy MARL methods: a misalignment arises between local policies andthe joint policy that maximizes the global Q-value, leading to violations ofthe IGM condition. To address this misalignment, we propose an order-preservingtransformation. Building on it, we introduce ME-IGM, a novel maximum entropyMARL algorithm compatible with any credit assignment mechanism that satisfiesthe IGM condition while enjoying the benefits of maximum entropy exploration.We empirically evaluate two variants of ME-IGM: ME-QMIX and ME-QPLEX, innon-monotonic matrix games, and demonstrate their state-of-the-art performanceacross 17 scenarios in SMAC-v2 and Overcooked.