Abstract
Multi-Agent Reinforcement Learning (MARL) has achieved significant success inlarge-scale AI systems and big-data applications such as smart grids,surveillance, etc. Existing advancements in MARL algorithms focus on improvingthe rewards obtained by introducing various mechanisms for inter-agentcooperation. However, these optimizations are usually compute- andmemory-intensive, thus leading to suboptimal speed performance in end-to-endtraining time. In this work, we analyze the speed performance (i.e.,latency-bounded throughput) as the key metric in MARL implementations.Specifically, we first introduce a taxonomy of MARL algorithms from anacceleration perspective categorized by (1) training scheme and (2)communication method. Using our taxonomy, we identify three state-of-the-artMARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG),Target-oriented Multi-agent Communication and Cooperation (ToM2C), andNetworked Multi-Agent RL (NeurComm) - as target benchmark algorithms, andprovide a systematic analysis of their performance bottlenecks on a homogeneousmulti-core CPU platform. We justify the need for MARL latency-boundedthroughput to be a key performance metric in future literature while alsoaddressing opportunities for parallelization and acceleration.