Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

Abstract

We address the task of Vision-Language Navigation in Continuous Environments(VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularlychallenging due to the absence of expert demonstrations for training andminimal environment structural prior to guide navigation. To confront thesechallenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframeszero-shot VLN-CE as a sequential, constraint-aware sub-instruction completionprocess. CA-Nav continuously translates sub-instructions into navigation plansusing two core modules: the Constraint-Aware Sub-instruction Manager (CSM) andthe Constraint-Aware Value Mapper (CVM). CSM defines the completion criteriafor decomposed sub-instructions as constraints and tracks navigation progressby switching sub-instructions in a constraint-aware manner. CVM, guided byCSM's constraints, generates a value map on the fly and refines it usingsuperpixel clustering to improve navigation stability. CA-Nav achieves thestate-of-the-art performance on two VLN-CE benchmarks, surpassing the previousbest method by 12 percent and 13 percent in Success Rate on the validationunseen splits of R2R-CE and RxR-CE, respectively. Moreover, CA-Nav demonstratesits effectiveness in real-world robot deployments across various indoor scenesand instructions.

Quick Read (beta)

loading the full paper ...