Neuroevolution of Self-Interpretable Agents

Abstract

Inattentional blindness is the psychological phenomenon that causes one tomiss things in plain sight. It is a consequence of the selective attention inperception that lets us remain focused on important parts of our world withoutdistraction from irrelevant details. Motivated by selective attention, we studythe properties of artificial agents that perceive the world through the lens ofa self-attention bottleneck. By constraining access to only a small fraction ofthe visual input, we show that their policies are directly interpretable inpixel space. We find neuroevolution ideal for training self-attentionarchitectures for vision-based reinforcement learning (RL) tasks, allowing usto incorporate modules that can include discrete, non-differentiable operationswhich are useful for our agent. We argue that self-attention has similarproperties as indirect encoding, in the sense that large implicit weightmatrices are generated from a small number of key-query parameters, thusenabling our agent to solve challenging vision based tasks with at least 1000xfewer parameters than existing methods. Since our agent attends to only taskcritical visual hints, they are able to generalize to environments where taskirrelevant elements are modified while conventional methods fail. Videos of ourresults and source code available at https://attentionagent.github.io/

Quick Read (beta)

loading the full paper ...