Multi-Agent Trust Region Policy Optimization

  • 2020-10-15 17:49:47
  • Hepeng Li, Haibo He
  • 1


We extend trust region policy optimization (TRPO) to multi-agentreinforcement learning (MARL) problems. We show that the policy update of TRPOcan be transformed into a distributed consensus optimization problem formulti-agent cases. By making a series of approximations to the consensusoptimization model, we propose a decentralized MARL algorithm, which we callmulti-agent TRPO (MATRPO). This algorithm can optimize distributed policiesbased on local observations and private rewards. The agents do not need to knowobservations, rewards, policies or value/action-value functions of otheragents. The agents only share a likelihood ratio with their neighbors duringthe training process. The algorithm is fully decentralized andprivacy-preserving. Our experiments on two cooperative games demonstrate itsrobust performance on complicated MARL tasks.


Quick Read (beta)

loading the full paper ...