Abstract
Multi-agent reinforcement learning focuses on training the behaviors ofmultiple learning agents that coexist in a shared environment. Recently, MARLmodels, such as the Multi-Agent Transformer (MAT) and ACtion dEpendent deepQ-learning (ACE), have significantly improved performance by leveragingsequential decision-making processes. Although these models can enhanceperformance, they do not explicitly consider the importance of the order inwhich agents make decisions. In this paper, we propose an Agent Order of ActionDecisions-MAT (AOAD-MAT), a novel MAT model that considers the order in whichagents make decisions. The proposed model explicitly incorporates the sequenceof action decisions into the learning process, allowing the model to learn andpredict the optimal order of agent actions. The AOAD-MAT model leverages aTransformer-based actor-critic architecture that dynamically adjusts thesequence of agent actions. To achieve this, we introduce a novel MARLarchitecture that cooperates with a subtask focused on predicting the nextagent to act, integrated into a Proximal Policy Optimization based lossfunction to synergistically maximize the advantage of the sequentialdecision-making. The proposed method was validated through extensiveexperiments on the StarCraft Multi-Agent Challenge and Multi-Agent MuJoCobenchmarks. The experimental results show that the proposed AOAD-MAT modeloutperforms existing MAT and other baseline models, demonstrating theeffectiveness of adjusting the AOAD order in MARL.