Multi-Agent Reinforcement Learning for Unmanned Aerial Vehicle Coordination by Multi-Critic Policy Gradient Optimization

Abstract

Recent technological progress in the development of Unmanned Aerial Vehicles(UAVs) together with decreasing acquisition costs make the application of dronefleets attractive for a wide variety of tasks. In agriculture, disastermanagement, search and rescue operations, commercial and military applications,the advantage of applying a fleet of drones originates from their ability tocooperate autonomously. Multi-Agent Reinforcement Learning approaches that aimto optimize a neural network based control policy, such as the best performingactor-critic policy gradient algorithms, struggle to effectively back-propagateerrors of distinct rewards signal sources and tend to favor lucrative signalswhile neglecting coordination and exploitation of previously learnedsimilarities. We propose a Multi-Critic Policy Optimization architecture withmultiple value estimating networks and a novel advantage function thatoptimizes a stochastic actor policy network to achieve optimal coordination ofagents. Consequently, we apply the algorithm to several tasks that require thecollaboration of multiple drones in a physics-based reinforcement learningenvironment. Our approach achieves a stable policy network update andsimilarity in reward signal development for an increasing number of agents. Theresulting policy achieves optimal coordination and compliance with constraintssuch as collision avoidance.

Quick Read (beta)

loading the full paper ...