Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

  • 2025-06-18 18:58:17
  • Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, Kai-Wei Chang
  • 0

Abstract

AI agents today are mostly siloed - they either retrieve and reason over vastamount of digital information and knowledge obtained online; or interact withthe physical world through embodied perception, planning and action - butrarely both. This separation limits their ability to solve tasks that requireintegrated physical and digital intelligence, such as cooking from onlinerecipes, navigating with dynamic map data, or interpreting real-world landmarksusing web knowledge. We introduce Embodied Web Agents, a novel paradigm for AIagents that fluidly bridge embodiment and web-scale reasoning. Tooperationalize this concept, we first develop the Embodied Web Agents taskenvironments, a unified simulation platform that tightly integrates realistic3D indoor and outdoor environments with functional web interfaces. Buildingupon this platform, we construct and release the Embodied Web Agents Benchmark,which encompasses a diverse suite of tasks including cooking, navigation,shopping, tourism, and geolocation - all requiring coordinated reasoning acrossphysical and digital realms for systematic assessment of cross-domainintelligence. Experimental results reveal significant performance gaps betweenstate-of-the-art AI systems and human capabilities, establishing bothchallenges and opportunities at the intersection of embodied cognition andweb-scale knowledge access. All datasets, codes and websites are publiclyavailable at our project page https://embodied-web-agent.github.io/.

 

Quick Read (beta)

loading the full paper ...