Abstract
Deep Reinforcement Learning (RL) involves the use of Deep Neural Networks(DNNs) to make sequential decisions in order to maximize reward. For many tasksthe resulting sequence of actions produced by a Deep RL policy can be long anddifficult to understand for humans. A crucial component of human explanationsis selectivity, whereby only key decisions and causes are recounted. ImbuingDeep RL agents with such an ability would make their resulting policies easierto understand from a human perspective and generate a concise set ofinstructions to aid the learning of future agents. To this end we use a Deep RLagent with an episodic memory system to identify and recount key decisionsduring policy execution. We show that these decisions form a short, humanreadable explanation that can also be used to speed up the learning of naiveDeep RL agents in an algorithm-independent manner.