Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Abstract

Existing evaluation suites for multi-agent reinforcement learning (MARL) donot assess generalization to novel situations as their primary objective(unlike supervised-learning benchmarks). Our contribution, Melting Pot, is aMARL evaluation suite that fills this gap, and uses reinforcement learning toreduce the human labor required to create novel test scenarios. This worksbecause one agent's behavior constitutes (part of) another agent's environment.To demonstrate scalability, we have created over 80 unique test scenarioscovering a broad range of research topics such as social dilemmas, reciprocity,resource sharing, and task partitioning. We apply these test scenarios tostandard MARL training algorithms, and demonstrate how Melting Pot revealsweaknesses not apparent from training performance alone.

Quick Read (beta)

loading the full paper ...