Abstract
Reinforcement learning agents often forget details of the past, especiallyafter delays or distractor tasks. Agents with common memory architecturesstruggle to recall and integrate across multiple timesteps of a past event, oreven to recall the details of a single timestep that is followed by distractortasks. To address these limitations, we propose a Hierarchical TransformerMemory (HTM), which helps agents to remember the past in detail. HTM storesmemories by dividing the past into chunks, and recalls by first performinghigh-level attention over coarse summaries of the chunks, and then performingdetailed attention within only the most relevant chunks. An agent with HTM cantherefore "mentally time-travel" -- remember past events in detail withoutattending to all intervening events. We show that agents with HTM substantiallyoutperform agents with other memory architectures at tasks requiring long-termrecall, retention, or reasoning over memory. These include recalling where anobject is hidden in a 3D environment, rapidly learning to navigate efficientlyin a new neighborhood, and rapidly learning and retaining new object names.Agents with HTM can extrapolate to task sequences an order of magnitude longerthan they were trained on, and can even generalize zero-shot from ameta-learning setting to maintaining knowledge across episodes. HTM improvesagent sample efficiency, generalization, and generality (by solving tasks thatpreviously required specialized architectures). Our work is a step towardsagents that can learn, interact, and adapt in complex and temporally-extendedenvironments.