Learning from Multiple Independent Advisors in Multi-agent Reinforcement Learning

Abstract

Multi-agent reinforcement learning typically suffers from the problem ofsample inefficiency, where learning suitable policies involves the use of manydata samples. Learning from external demonstrators is a possible solution thatmitigates this problem. However, most prior approaches in this area assume thepresence of a single demonstrator. Leveraging multiple knowledge sources (i.e.,advisors) with expertise in distinct aspects of the environment couldsubstantially speed up learning in complex environments. This paper considersthe problem of simultaneously learning from multiple independent advisors inmulti-agent reinforcement learning. The approach leverages a two-levelQ-learning architecture, and extends this framework from single-agent tomulti-agent settings. We provide principled algorithms that incorporate a setof advisors by both evaluating the advisors at each state and subsequentlyusing the advisors to guide action selection. We also provide theoreticalconvergence and sample complexity guarantees. Experimentally, we validate ourapproach in three different test-beds and show that our algorithms give betterperformances than baselines, can effectively integrate the combined expertiseof different advisors, and learn to ignore bad advice.

Quick Read (beta)

loading the full paper ...