Abstract
We present a framework and algorithm for peer-to-peer teaching in cooperativemultiagent reinforcement learning. Our algorithm, Learning to Coordinate andTeach Reinforcement (LeCTR), trains advising policies by using students'learning progress as a teaching reward. Agents using LeCTR learn to assume therole of a teacher or student at the appropriate moments, exchanging actionadvice to accelerate the entire learning process. Our algorithm supportsteaching heterogeneous teammates, advising under communication constraints, andlearns both what and when to advise. LeCTR is demonstrated to outperform thefinal performance and rate of learning of prior teaching methods on multiplebenchmark domains. To our knowledge, this is the first approach for learning toteach in a multiagent setting.