On Reinforcement Learning for Full-length Game of StarCraft

Abstract

StarCraft II poses a grand challenge for reinforcement learning. The maindifficulties of it include huge state and action space and a long-time horizon.In this paper, we investigate a hierarchical reinforcement learning approachfor StarCraft II. The hierarchy involves two levels of abstraction. One is themacro-action automatically extracted from expert's trajectories, which reducesthe action space in an order of magnitude yet remains effective. The other is atwo-layer hierarchical architecture which is modular and easy to scale,enabling a curriculum transferring from simpler tasks to more complex tasks.The reinforcement training algorithm for this architecture is alsoinvestigated. On a 64x64 map and using restrictive units, we achieve a winningrate of more than 99\% against the difficulty level-1 built-in AI. Through thecurriculum transfer learning algorithm and a mixture of combat model, we canachieve over 93\% winning rate of Protoss against the most difficultnon-cheating built-in AI (level-7) of Terran, training within two days using asingle machine with only 48 CPU cores and 8 K40 GPUs. It also shows stronggeneralization performance, when tested against never seen opponents includingcheating levels built-in AI and all levels of Zerg and Protoss built-in AI. Wehope this study could shed some light on the future research of large-scalereinforcement learning.

Quick Read (beta)

loading the full paper ...