Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations

Abstract

Multi-agent reinforcement learning systems aim to provide interacting agentswith the ability to collaboratively learn and adapt to the behaviour of otheragents. In many real-world applications, the agents can only acquire a partialview of the world. Here we consider a setting whereby most agents' observationsare also extremely noisy, hence only weakly correlated to the true state of theenvironment. Under these circumstances, learning an optimal policy becomesparticularly challenging, even in the unrealistic case that an agent's policycan be made conditional upon all other agents' observations. To overcome thesedifficulties, we propose a multi-agent deep deterministic policy gradientalgorithm enhanced by a communication medium (MADDPG-M), which implements atwo-level, concurrent learning mechanism. An agent's policy depends on its ownprivate observations as well as those explicitly shared by others through acommunication medium. At any given point in time, an agent must decide whetherits private observations are sufficiently informative to be shared with others.However, our environments provide no explicit feedback informing an agentwhether a communication action is beneficial, rather the communication policiesmust also be learned through experience concurrently to the main policies. Ourexperimental results demonstrate that the algorithm performs well in six highlynon-stationary environments of progressively higher complexity, and offerssubstantial performance gains compared to the baselines.

Quick Read (beta)

loading the full paper ...