Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning

Abstract

Reinforcement learning (RL) has been widely adopted for controlling andoptimizing complex engineering systems such as next-generation wirelessnetworks. An important challenge in adopting RL is the need for direct accessto the physical environment. This limitation is particularly severe inmulti-agent systems, for which conventional multi-agent reinforcement learning(MARL) requires a large number of coordinated online interactions with theenvironment during training. When only offline data is available, a directapplication of online MARL schemes would generally fail due to the epistemicuncertainty entailed by the lack of exploration during training. In this work,we propose an offline MARL scheme that integrates distributional RL andconservative Q-learning to address the environment's inherent aleatoricuncertainty and the epistemic uncertainty arising from the use of offline data.We explore both independent and joint learning strategies. The proposed MARLscheme, referred to as multi-agent conservative quantile regression, addressesgeneral risk-sensitive design criteria and is applied to the trajectoryplanning problem in drone networks, showcasing its advantages.

Quick Read (beta)

loading the full paper ...