UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents

  • 2025-08-22 00:25:53
  • Jianqiang Xiao, Yuexuan Sun, Yixin Shao, Boxi Gan, Rongqiang Liu, Yanjing Wu, Weili Guan, Xiang Deng
  • 0

Abstract

Aerial navigation is a fundamental yet underexplored capability in embodiedintelligence, enabling agents to operate in large-scale, unstructuredenvironments where traditional navigation paradigms fall short. However, mostexisting research follows the Vision-and-Language Navigation (VLN) paradigm,which heavily depends on sequential linguistic instructions, limiting itsscalability and autonomy. To address this gap, we introduce UAV-ON, a benchmarkfor large-scale Object Goal Navigation (ObjectNav) by aerial agents inopen-world environments, where agents operate based on high-level semanticgoals without relying on detailed instructional guidance as in VLN. UAV-ONcomprises 14 high-fidelity Unreal Engine environments with diverse semanticregions and complex spatial layouts, covering urban, natural, and mixed-usesettings. It defines 1270 annotated target objects, each characterized by aninstance-level instruction that encodes category, physical footprint, andvisual descriptors, allowing grounded reasoning. These instructions serve assemantic goals, introducing realistic ambiguity and complex reasoningchallenges for aerial agents. To evaluate the benchmark, we implement severalbaseline methods, including Aerial ObjectNav Agent (AOA), a modular policy thatintegrates instruction semantics with egocentric observations for long-horizon,goal-directed exploration. Empirical results show that all baselines strugglein this setting, highlighting the compounded challenges of aerial navigationand semantic goal grounding. UAV-ON aims to advance research on scalable UAVautonomy driven by semantic goal descriptions in complex real-worldenvironments.

 

Quick Read (beta)

loading the full paper ...