Decentralized Likelihood Quantile Networks for Improving Performance in Deep Multi-Agent Reinforcement Learning

Abstract

Recent successes of value-based multi-agent deep reinforcement learningemploy optimism by limiting underestimation updates of value functionestimator, through carefully controlled learning rate (Omidshafiei et al.,2017) or reduced update probability (Palmer et al., 2018). To achieve fullcooperation when learning independently, an agent must estimate the statevalues contingent on having optimal teammates; therefore, value overestimationis frequency injected to counteract negative effects caused by unobservableteammate sub-optimal policies and explorations. Aiming to solve this issuethrough automatic scheduling, this paper introduces a decentralized quantileestimator, which we found empirically to be more stable, sample efficient andmore likely to converge to the joint optimal policy.

Quick Read (beta)

loading the full paper ...