Abstract
This work presents a Hierarchical Multi-Agent Reinforcement Learningframework for analyzing simulated air combat scenarios involving heterogeneousagents. The objective is to identify effective Courses of Action that lead tomission success within preset simulations, thereby enabling the exploration ofreal-world defense scenarios at low cost and in a safe-to-fail setting.Applying deep Reinforcement Learning in this context poses specific challenges,such as complex flight dynamics, the exponential size of the state and actionspaces in multi-agent systems, and the capability to integrate real-timecontrol of individual units with look-ahead planning. To address thesechallenges, the decision-making process is split into two levels ofabstraction: low-level policies control individual units, while a high-levelcommander policy issues macro commands aligned with the overall missiontargets. This hierarchical structure facilitates the training process byexploiting policy symmetries of individual agents and by separating controlfrom command tasks. The low-level policies are trained for individual combatcontrol in a curriculum of increasing complexity. The high-level commander isthen trained on mission targets given pre-trained control policies. Theempirical validation confirms the advantages of the proposed framework.