On the Complexity of Exploration in Goal-Driven Navigation

Abstract

Building agents that can explore their environments intelligently is achallenging open problem. In this paper, we make a step towards understandinghow a hierarchical design of the agent's policy can affect its explorationcapabilities. First, we design EscapeRoom environments, where the agent mustfigure out how to navigate to the exit by accomplishing a number ofintermediate tasks (\emph{subgoals}), such as finding keys or opening doors.Our environments are procedurally generated and vary in complexity, which canbe controlled by the number of subgoals and relationships between them. Next,we propose to measure the complexity of each environment by constructingdependency graphs between the goals and analytically computing \emph{hittingtimes} of a random walk in the graph. We empirically evaluate Proximal PolicyOptimization (PPO) with sparse and shaped rewards, a variation of policysketches, and a hierarchical version of PPO (called HiPPO) akin to h-DQN. Weshow that analytically estimated \emph{hitting time} in goal dependency graphsis an informative metric of the environment complexity. We conjecture that theresult should hold for environments other than navigation. Finally, we showthat solving environments beyond certain level of complexity requireshierarchical approaches.

Quick Read (beta)

loading the full paper ...