Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents

Abstract

As deep reinforcement learning driven by visual perception becomes morewidely used there is a growing need to better understand and probe the learnedagents. Understanding the decision making process and its relationship tovisual inputs can be very valuable to identify problems in learned behavior.However, this topic has been relatively under-explored in the researchcommunity. In this work we present a method for synthesizing visual inputs ofinterest for a trained agent. Such inputs or states could be situations inwhich specific actions are necessary. Further, critical states in which a veryhigh or a very low reward can be achieved are often interesting to understandthe situational awareness of the system as they can correspond to risky states.To this end, we learn a generative model over the state space of theenvironment and use its latent space to optimize a target function for thestate of interest. In our experiments we show that this method can generateinsights for a variety of environments and reinforcement learning methods. Weexplore results in the standard Atari benchmark games as well as in anautonomous driving simulator. Based on the efficiency with which we have beenable to identify behavioural weaknesses with this technique, we believe thisgeneral approach could serve as an important tool for AI safety applications.

Quick Read (beta)

loading the full paper ...