Abstract
We consider model-based reinforcement learning (MBRL) in 2-agent,high-fidelity continuous control problems -- an important domain for robotsinteracting with other agents in the same workspace. For non-trivial dynamicalsystems, MBRL typically suffers from accumulating errors. Several recentstudies have addressed this problem by learning latent variable models fortrajectory segments and optimizing over behavior in the latent space. In thiswork, we investigate whether this approach can be extended to 2-agentcompetitive and cooperative settings. The fundamental challenge is how to learnmodels that capture interactions between agents, yet are disentangled to allowfor optimization of each agent behavior separately. We propose such modelsbased on a disentangled variational auto-encoder, and demonstrate our approachon a simulated 2-robot manipulation task, where one robot can either help ordistract the other. We show that our approach has better sample efficiency thana strong model-free RL baseline, and can learn both cooperative and adversarialbehavior from the same data.