Abstract
Benchmarks are crucial for assessing multi-agent reinforcement learning(MARL) algorithms. While StarCraft II-related environments have drivensignificant advances in MARL, existing benchmarks like SMAC focus primarily onmicromanagement, limiting comprehensive evaluation of high-level strategicintelligence. To address this, we introduce HLSMAC, a new cooperative MARLbenchmark with 12 carefully designed StarCraft II scenarios based on classicalstratagems from the Thirty-Six Stratagems. Each scenario corresponds to aspecific stratagem and is designed to challenge agents with diverse strategicelements, including tactical maneuvering, timing coordination, and deception,thereby opening up avenues for evaluating high-level strategic decision-makingcapabilities. We also propose novel metrics across multiple dimensions beyondconventional win rate, such as ability utilization and advancement efficiency,to assess agents' overall performance within the HLSMAC environment. Weintegrate state-of-the-art MARL algorithms and LLM-based agents with ourbenchmark and conduct comprehensive experiments. The results demonstrate thatHLSMAC serves as a robust testbed for advancing multi-agent strategicdecision-making.