Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration

Abstract

Efficient exploration in deep cooperative multi-agent reinforcement learning(MARL) still remains challenging in complex coordination problems. In thispaper, we introduce a novel Episodic Multi-agent reinforcement learning withCuriosity-driven exploration, called EMC. We leverage an insight of popularfactorized MARL algorithms that the "induced" individual Q-values, i.e., theindividual utility functions used for local execution, are the embeddings oflocal action-observation histories, and can capture the interaction betweenagents due to reward backpropagation during centralized training. Therefore, weuse prediction errors of individual Q-values as intrinsic rewards forcoordinated exploration and utilize episodic memory to exploit exploredinformative experience to boost policy training. As the dynamics of an agent'sindividual Q-value function captures the novelty of states and the influencefrom other agents, our intrinsic reward can induce coordinated exploration tonew or promising states. We illustrate the advantages of our method by didacticexamples, and demonstrate its significant outperformance over state-of-the-artMARL baselines on challenging tasks in the StarCraft II micromanagementbenchmark.

Quick Read (beta)

loading the full paper ...