Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Abstract

Multiagent reinforcement learning (MARL) is commonly considered to sufferfrom non-stationary environments and exponentially increasing policy space. Itwould be even more challenging when rewards are sparse and delayed over longtrajectories. In this paper, we study hierarchical deep MARL in cooperativemultiagent problems with sparse and delayed reward. With temporal abstraction,we decompose the problem into a hierarchy of different time scales andinvestigate how agents can learn high-level coordination based on theindependent skills learned at the low level. Three hierarchical deep MARLarchitectures are proposed to learn hierarchical policies under different MARLparadigms. Besides, we propose a new experience replay mechanism to alleviatethe issue of the sparse transitions at the high level of abstraction and thenon-stationarity of multiagent learning. We empirically demonstrate theeffectiveness of our approaches in two domains with extremely sparse feedback:(1) a variety of Multiagent Trash Collection tasks, and (2) a challengingonline mobile game, i.e., Fever Basketball Defense.

Quick Read (beta)

loading the full paper ...