Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning

Abstract

Offline multi-agent reinforcement learning (MARL) is increasingly recognizedas crucial for effectively deploying RL algorithms in environments wherereal-time interaction is impractical, risky, or costly. In the offline setting,learning from a static dataset of past interactions allows for the developmentof robust and safe policies without the need for live data collection, whichcan be fraught with challenges. Building on this foundational importance, wepresent EAQ, Episodes Augmentation guided by Q-total loss, a novel approach foroffline MARL framework utilizing diffusion models. EAQ integrates the Q-totalfunction directly into the diffusion model as a guidance to maximize the globalreturns in an episode, eliminating the need for separate training. Our focusprimarily lies on cooperative scenarios, where agents are required to actcollectively towards achieving a shared goal-essentially, maximizing globalreturns. Consequently, we demonstrate that our episodes augmentation in acollaborative manner significantly boosts offline MARL algorithm compared tothe original dataset, improving the normalized return by +17.3% and +12.9% formedium and poor behavioral policies in SMAC simulator, respectively.

Quick Read (beta)

loading the full paper ...