From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos

Abstract

Training a team of agents from scratch in multi-agent reinforcement learning(MARL) is highly inefficient, much like asking beginners to play a symphonytogether without first practicing solo. Existing methods, such as offline ortransferable MARL, can ease this burden, but they still rely on costlymulti-agent data, which often becomes the bottleneck. In contrast, soloexperiences are far easier to obtain in many important scenarios, e.g.,collaborative coding, household cooperation, and search-and-rescue. To unlocktheir potential, we propose Solo-to-Collaborative RL (SoCo), a framework thattransfers solo knowledge into cooperative learning. SoCo first pretrains ashared solo policy from solo demonstrations, then adapts it for cooperationduring multi-agent training through a policy fusion mechanism that combines anMoE-like gating selector and an action editor. Experiments across diversecooperative tasks show that SoCo significantly boosts the training efficiencyand performance of backbone algorithms. These results demonstrate that solodemonstrations provide a scalable and effective complement to multi-agent data,making cooperative learning more practical and broadly applicable.

Quick Read (beta)

loading the full paper ...