Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

Abstract

We develop a general reinforcement learning framework for mean field control(MFC) problems. Such problems arise for instance as the limit of collaborativemulti-agent control problems when the number of agents is very large. Theasymptotic problem can be phrased as the optimal control of a non-lineardynamics. This can also be viewed as a Markov decision process (MDP) but thekey difference with the usual RL setup is that the dynamics and the reward nowdepend on the state's probability distribution itself. Alternatively, it can berecast as a MDP on the Wasserstein space of measures. In this work, weintroduce generic model-free algorithms based on the state-action valuefunction at the mean field level and we prove convergence for a prototypicalQ-learning method. We then implement an actor-critic method and reportnumerical results on two archetypal problems: a finite space model motivated bya cyber security application and a continuous space model motivated by anapplication to swarm motion.

Quick Read (beta)

loading the full paper ...