Abstract
The dynamics between agents and the environment are an important component ofmulti-agent Reinforcement Learning (RL), and learning them provides a basis fordecision making. However, a major challenge in optimizing a learned dynamicsmodel is the accumulation of error when predicting multiple steps into thefuture. Recent advances in variational inference provide model based solutionsthat predict complete trajectory segments, and optimize over a latentrepresentation of trajectories. For single-agent scenarios, several recentstudies have explored this idea, and showed its benefits over conventionalmethods. In this work, we extend this approach to the multi-agent case, andeffectively optimize over a latent space that encodes multi-agent strategies.We discuss the challenges in optimizing over a latent variable model formultiple agents, both in the optimization algorithm and in the modelrepresentation, and propose a method for both cooperative and competitivesettings based on risk-sensitive optimization. We evaluate our method on tasksin the multi-agent particle environment and on a simulated RoboCup domain.