Attention models have had a significant positive impact on deep learningacross a range of tasks. However previous attempts at integrating attentionwith reinforcement learning have failed to produce significant improvements. Wepropose the first combination of self attention and reinforcement learning thatis capable of producing significant improvements, including new state of theart results in the Arcade Learning Environment. Unlike the selective attentionmodels used in previous attempts, which constrain the attention viapreconceived notions of importance, our implementation utilises the Markovianproperties inherent in the state input. Our method produces a faithfulvisualisation of the policy, focusing on the behaviour of the agent. Ourexperiments demonstrate that the trained policies use multiple simultaneousfoci of attention, and are able to modulate attention over time to deal withsituations of partial observability.