Machine versus Human Attention in Deep Reinforcement Learning Tasks

Abstract

Deep reinforcement learning (RL) algorithms are powerful tools for solvingvisuomotor decision tasks. However, the trained models are often difficult tointerpret, because they are represented as end-to-end deep neural networks. Inthis paper, we shed light on the inner workings of such trained models byanalyzing the pixels that they attend to during task execution, and comparingthem with the pixels attended to by humans executing the same tasks. To thisend, we investigate the following two questions that, to the best of ourknowledge, have not been previously studied. 1) How similar are the visualfeatures learned by RL agents and humans when performing the same task? and, 2)How do similarities and differences in these learned features explain RLagents' performance on these tasks? Specifically, we compare the saliency mapsof RL agents against visual attention models of human experts when learning toplay Atari games. Further, we analyze how hyperparameters of the deep RLalgorithm affect the learned features and saliency maps of the trained agents.The insights provided by our results have the potential to inform novelalgorithms for the purpose of closing the performance gap between human expertsand deep RL agents.

Quick Read (beta)

loading the full paper ...