Exploiting Semantic Epsilon Greedy Exploration Strategy in Multi-Agent Reinforcement Learning

Abstract

Multi-agent reinforcement learning (MARL) can model many real worldapplications. However, many MARL approaches rely on epsilon greedy forexploration, which may discourage visiting advantageous states in hardscenarios. In this paper, we propose a new approach QMIX(SEG) for tacklingMARL. It makes use of the value function factorization method QMIX to trainper-agent policies and a novel Semantic Epsilon Greedy (SEG) explorationstrategy. SEG is a simple extension to the conventional epsilon greedyexploration strategy, yet it is experimentally shown to greatly improve theperformance of MARL. We first cluster actions into groups of actions withsimilar effects and then use the groups in a bi-level epsilon greedyexploration hierarchy for action selection. We argue that SEG facilitatessemantic exploration by exploring in the space of groups of actions, which havericher semantic meanings than atomic actions. Experiments show that QMIX(SEG)largely outperforms QMIX and leads to strong performance competitive withcurrent state-of-the-art MARL approaches on the StarCraft Multi-Agent Challenge(SMAC) benchmark.

Quick Read (beta)

loading the full paper ...