Abstract
Reinforcement learning algorithms for mean-field games offer a scalableframework for optimizing policies in large populations of interacting agents.Existing methods often depend on online interactions or access to systemdynamics, limiting their practicality in real-world scenarios where suchinteractions are infeasible or difficult to model. In this paper, we presentOffline Munchausen Mirror Descent (Off-MMD), a novel mean-field RL algorithmthat approximates equilibrium policies in mean-field games using purely offlinedata. By leveraging iterative mirror descent and importance samplingtechniques, Off-MMD estimates the mean-field distribution from static datasetswithout relying on simulation or environment dynamics. Additionally, weincorporate techniques from offline reinforcement learning to address commonissues like Q-value overestimation, ensuring robust policy learning even withlimited data coverage. Our algorithm scales to complex environments anddemonstrates strong performance on benchmark tasks like crowd exploration ornavigation, highlighting its applicability to real-world multi-agent systemswhere online experimentation is infeasible. We empirically demonstrate therobustness of Off-MMD to low-quality datasets and conduct experiments toinvestigate its sensitivity to hyperparameter choices.