Shaping Advice in Deep Multi-Agent Reinforcement Learning

Abstract

Multi-agent reinforcement learning involves multiple agents interacting witheach other and a shared environment to complete tasks. When rewards provided bythe environment are sparse, agents may not receive immediate feedback on thequality of actions that they take, thereby affecting learning of policies. Inthis paper, we propose a method called Shaping Advice in deep Multi-agentreinforcement learning (SAM) to augment the reward signal from the environmentwith an additional reward termed shaping advice. The shaping advice is given bya difference of potential functions at consecutive time-steps. Each potentialfunction is a function of observations and actions of the agents. The shapingadvice needs to be specified only once at the start of training, and can beeasily provided by non-experts. We show through theoretical analyses andexperimental validation that shaping advice provided by SAM does not distractagents from completing tasks specified by the environment reward.Theoretically, we prove that convergence of policy gradients and valuefunctions when using SAM implies convergence of these quantities in the absenceof SAM. Experimentally, we evaluate SAM on three tasks in the multi-agentParticle World environment that have sparse rewards. We observe that using SAMresults in agents learning policies to complete tasks faster, and obtain higherrewards than: i) using sparse rewards alone; ii) a state-of-the-art rewardredistribution method.

Quick Read (beta)

loading the full paper ...