Abstract
Planning with partial observation is a central challenge in embodied AI. Amajority of prior works have tackled this challenge by developing agents thatphysically explore their environment to update their beliefs about the worldstate. In contrast, humans can $\textit{imagine}$ unseen parts of the worldthrough a mental exploration and $\textit{revise}$ their beliefs with imaginedobservations. Such updated beliefs can allow them to make more informeddecisions, without necessitating the physical exploration of the world at alltimes. To achieve this human-like ability, we introduce the $\textit{GenerativeWorld Explorer (Genex)}$, an egocentric world exploration framework that allowsan agent to mentally explore a large-scale 3D world (e.g., urban scenes) andacquire imagined observations to update its belief. This updated belief willthen help the agent to make a more informed decision at the current step. Totrain $\textit{Genex}$, we create a synthetic urban scene dataset, Genex-DB.Our experimental results demonstrate that (1) $\textit{Genex}$ can generatehigh-quality and consistent observations during long-horizon exploration of alarge virtual physical world and (2) the beliefs updated with the generatedobservations can inform an existing decision-making model (e.g., an LLM agent)to make better plans.