Malthusian Reinforcement Learning

  • 2019-03-03 14:58:43
  • Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel
Here we explore a new algorithmic framework for multi-agent reinforcementlearning, called Malthusian reinforcement learning, which extends self-play toinclude fitness-linked population size dynamics that drive ongoing innovation.In Malthusian RL, increases in a subpopulation's average return drivesubsequent increases in its size, just as Thomas Malthus argued in 1798 was therelationship between preindustrial income levels and population growth.Malthusian reinforcement learning harnesses the competitive pressures arisingfrom growing and shrinking population size to drive agents to explore regionsof state and policy spaces that they could not otherwise reach. Furthermore, inenvironments where there are potential gains from specialization and divisionof labor, we show that Malthusian reinforcement learning is better positionedto take advantage of such synergies than algorithms based on self-play.


