Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control

Abstract

Flocking control is a challenging problem, where multiple agents, such asdrones or vehicles, need to reach a target position while maintaining the flockand avoiding collisions with obstacles and collisions among agents in theenvironment. Multi-agent reinforcement learning has achieved promisingperformance in flocking control. However, methods based on traditionalreinforcement learning require a considerable number of interactions betweenagents and the environment. This paper proposes a sub-optimal policy aidedmulti-agent reinforcement learning algorithm (SPA-MARL) to boost sampleefficiency. SPA-MARL directly leverages a prior policy that can be manuallydesigned or solved with a non-learning method to aid agents in learning, wherethe performance of the policy can be sub-optimal. SPA-MARL recognizes thedifference in performance between the sub-optimal policy and itself, and thenimitates the sub-optimal policy if the sub-optimal policy is better. Weleverage SPA-MARL to solve the flocking control problem. A traditional controlmethod based on artificial potential fields is used to generate a sub-optimalpolicy. Experiments demonstrate that SPA-MARL can speed up the training processand outperform both the MARL baseline and the used sub-optimal policy.

Quick Read (beta)

loading the full paper ...