Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

Abstract

Recently, multiagent deep reinforcement learning (DRL) has receivedincreasingly wide attention. Existing multiagent DRL algorithms are inefficientwhen facing with the non-stationarity due to agents update their policiessimultaneously in stochastic cooperative environments. This paper extends therecently proposed weighted double estimator to the multiagent domain andpropose a multiagent DRL framework, named weighted double deep Q-network(WDDQN). By utilizing the weighted double estimator and the deep neuralnetwork, WDDQN can not only reduce the bias effectively but also be extended toscenarios with raw visual inputs. To achieve efficient cooperation in themultiagent domain, we introduce the lenient reward network and the scheduledreplay strategy. Experiments show that the WDDQN outperforms the existing DRLand multiaent DRL algorithms, i.e., double DQN and lenient Q-learning, in termsof the average reward and the convergence rate in stochastic cooperativeenvironments.

Quick Read (beta)

loading the full paper ...