Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

Abstract

In many real-world scenarios, an autonomous agent often encounters varioustasks within a single complex environment. We propose to build a graphabstraction over the environment structure to accelerate the learning of thesetasks. Here, nodes are important points of interest (pivotal states) and edgesrepresent feasible traversals between them. Our approach has two stages. First,we jointly train a latent pivotal state model and a curiosity-drivengoal-conditioned policy in a task-agnostic manner. Second, provided with theinformation from the world graph, a high-level Manager quickly finds solutionto new tasks and expresses subgoals in reference to pivotal states to alow-level Worker. The Worker can then also leverage the graph to easilytraverse to the pivotal states of interest, even across long distance, andexplore non-locally. We perform a thorough ablation study to evaluate ourapproach on a suite of challenging maze tasks, demonstrating significantadvantages from the proposed framework over baselines that lack world graphknowledge in terms of performance and efficiency.

Quick Read (beta)

loading the full paper ...