Embodied World Models Emerge from Navigational Task in Open-Ended Environments

Abstract

Understanding how artificial systems can develop spatial awareness andreasoning has long been a challenge in AI research. Traditional models oftenrely on passive observation, but embodied cognition theory suggests that deeperunderstanding emerges from active interaction with the environment. This studyinvestigates whether neural networks can autonomously internalize spatialconcepts through interaction, focusing on planar navigation tasks. Using GatedRecurrent Units (GRUs) combined with Meta-Reinforcement Learning (Meta-RL), weshow that agents can learn to encode spatial properties like direction,distance, and obstacle avoidance. We introduce Hybrid Dynamical Systems (HDS)to model the agent-environment interaction as a closed dynamical system,revealing stable limit cycles that correspond to optimal navigation strategies.Ridge Representation allows us to map navigation paths into a fixed-dimensionalbehavioral space, enabling comparison with neural states. Canonical CorrelationAnalysis (CCA) confirms strong alignment between these representations,suggesting that the agent's neural states actively encode spatial knowledge.Intervention experiments further show that specific neural dimensions arecausally linked to navigation performance. This work provides an approach tobridging the gap between action and perception in AI, offering new insightsinto building adaptive, interpretable models that can generalize across complexenvironments. The causal validation of neural representations also opens newavenues for understanding and controlling the internal mechanisms of AIsystems, pushing the boundaries of how machines learn and reason in dynamic,real-world scenarios.

Quick Read (beta)

loading the full paper ...