Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

Abstract

The Vision-and-Language Navigation (VLN) task entails an agent followingnavigational instruction in photo-realistic unknown environments. Thischallenging task demands that the agent be aware of which instruction wascompleted, which instruction is needed next, which way to go, and itsnavigation progress towards the goal. In this paper, we introduce aself-monitoring agent with two complementary components: (1) visual-textualco-grounding module to locate the instruction completed in the past, theinstruction required for the next action, and the next moving direction fromsurrounding images and (2) progress monitor to ensure the grounded instructioncorrectly reflects the navigation progress. We test our self-monitoring agenton a standard benchmark and analyze our proposed approach through a series ofablation studies that elucidate the contributions of the primary components.Using our proposed method, we set the new state of the art by a significantmargin (8% absolute increase in success rate on the unseen test set). Code isavailable at https://github.com/chihyaoma/selfmonitoring-agent .

Quick Read (beta)

loading the full paper ...