Noise Distribution Decomposition based Multi-Agent Distributional Reinforcement Learning

Abstract

Generally, Reinforcement Learning (RL) agent updates its policy byrepetitively interacting with the environment, contingent on the receivedrewards to observed states and undertaken actions. However, the environmentaldisturbance, commonly leading to noisy observations (e.g., rewards and states),could significantly shape the performance of agent. Furthermore, the learningperformance of Multi-Agent Reinforcement Learning (MARL) is more susceptible tonoise due to the interference among intelligent agents. Therefore, it becomesimperative to revolutionize the design of MARL, so as to capably ameliorate theannoying impact of noisy rewards. In this paper, we propose a noveldecomposition-based multi-agent distributional RL method by approximating theglobally shared noisy reward by a Gaussian mixture model (GMM) and decomposingit into the combination of individual distributional local rewards, with whicheach agent can be updated locally through distributional RL. Moreover, adiffusion model (DM) is leveraged for reward generation in order to mitigatethe issue of costly interaction expenditure for learning distributions.Furthermore, the optimality of the distribution decomposition is theoreticallyvalidated, while the design of loss function is carefully calibrated to avoidthe decomposition ambiguity. We also verify the effectiveness of the proposedmethod through extensive simulation experiments with noisy rewards. Besides,different risk-sensitive policies are evaluated in order to demonstrate thesuperiority of distributional RL in different MARL tasks.

Quick Read (beta)

loading the full paper ...