Abstract
Understanding the behavior of deep reinforcement learning (DRL) agents --particularly as task and agent sophistication increase -- requires more thansimple comparison of reward curves, yet standard methods for behavioralanalysis remain underdeveloped in DRL. We apply tools from neuroscience andethology to study DRL agents in a novel, complex, partially observableenvironment, ForageWorld, designed to capture key aspects of real-world animalforaging -- including sparse, depleting resource patches, predator threats, andspatially extended arenas. We use this environment as a platform for applyingjoint behavioral and neural analysis to agents, revealing detailed,quantitatively grounded insights into agent strategies, memory, and planning.Contrary to common assumptions, we find that model-free RNN-based DRL agentscan exhibit structured, planning-like behavior purely through emergent dynamics-- without requiring explicit memory modules or world models. Our results showthat studying DRL agents like animals -- analyzing them withneuroethology-inspired tools that reveal structure in both behavior and neuraldynamics -- uncovers rich structure in their learning dynamics that wouldotherwise remain invisible. We distill these tools into a general analysisframework linking core behavioral and representational features to diagnosticmethods, which can be reused for a wide range of tasks and agents. As agentsgrow more complex and autonomous, bridging neuroscience, cognitive science, andAI will be essential -- not just for understanding their behavior, but forensuring safe alignment and maximizing desirable behaviors that are hard tomeasure via reward. We show how this can be done by drawing on lessons from howbiological intelligence is studied.