Abstract
Heterogeneous knowledge naturally arises among different agents incooperative multiagent reinforcement learning. As such, learning can be greatlyimproved if agents can effectively pass their knowledge on to other agents.Existing work has demonstrated that peer-to-peer knowledge transfer, a processreferred to as action advising, improves team-wide learning. In contrast toprevious frameworks that advise at the level of primitive actions, we aim tolearn high-level teaching policies that decide when and what high-level action(e.g., sub-goal) to advise a teammate. We introduce a new learning to teachframework, called hierarchical multiagent teaching (HMAT). The proposedframework solves difficulties faced by prior work on multiagent teaching whenoperating in domains with long horizons, delayed rewards, and continuousstates/actions by leveraging temporal abstraction and deep functionapproximation. Our empirical evaluations show that HMAT accelerates team-widelearning progress in difficult environments that are more complex than thoseexplored in previous work. HMAT also learns teaching policies that can betransferred to different teammates/tasks and can even teach teammates withheterogeneous action spaces.