NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments

Abstract

Vision-and-Language Navigation in Continuous Environments (VLN-CE) requiresagents to execute sequential navigation actions in complex environments guidedby natural language instructions. Current approaches often struggle withgeneralizing to novel environments and adapting to ongoing changes duringnavigation. Inspired by human cognition, we present NavMorph, a self-evolvingworld model framework that enhances environmental understanding anddecision-making in VLN-CE tasks. NavMorph employs compact latentrepresentations to model environmental dynamics, equipping agents withforesight for adaptive planning and policy refinement. By integrating a novelContextual Evolution Memory, NavMorph leverages scene-contextual information tosupport effective navigation while maintaining online adaptability. Extensiveexperiments demonstrate that our method achieves notable performanceimprovements on popular VLN-CE benchmarks. Code is available athttps://github.com/Feliciaxyao/NavMorph.

Quick Read (beta)

loading the full paper ...