Abstract
As autonomous robotic systems become increasingly mature, users will want tospecify missions at the level of intent rather than in low-level detail.Language is an expressive and intuitive medium for such mission specification.However, realizing language-guided robotic teams requires overcomingsignificant technical hurdles. Interpreting and realizing language-specifiedmissions requires advanced semantic reasoning. Successful heterogeneous robotsmust effectively coordinate actions and share information across varyingviewpoints. Additionally, communication between robots is typicallyintermittent, necessitating robust strategies that leverage communicationopportunities to maintain coordination and achieve mission objectives. In thiswork, we present a first-of-its-kind system where an unmanned aerial vehicle(UAV) and an unmanned ground vehicle (UGV) are able to collaborativelyaccomplish missions specified in natural language while reacting to changes inspecification on the fly. We leverage a Large Language Model (LLM)-enabledplanner to reason over semantic-metric maps that are built online andopportunistically shared between an aerial and a ground robot. We considertask-driven navigation in urban and rural areas. Our system must infermission-relevant semantics and actively acquire information via semanticmapping. In both ground and air-ground teaming experiments, we demonstrate oursystem on seven different natural-language specifications at up tokilometer-scale navigation.