Abstract
We introduce an approach for deep reinforcement learning (RL) that improvesupon the efficiency, generalization capacity, and interpretability ofconventional approaches through structured perception and relational reasoning.It uses self-attention to iteratively reason about the relations betweenentities in a scene and to guide a model-free policy. Our results show that ina novel navigation and planning task called Box-World, our agent findsinterpretable solutions that improve upon baselines in terms of samplecomplexity, ability to generalize to more complex scenes than experiencedduring training, and overall performance. In the StarCraft II LearningEnvironment, our agent achieves state-of-the-art performance on six mini-games-- surpassing human grandmaster performance on four. By consideringarchitectural inductive biases, our work opens new directions for overcomingimportant, but stubborn, challenges in deep RL.