Transparency and Explanation in Deep Reinforcement Learning Neural Networks

Abstract

Autonomous AI systems will be entering human society in the near future toprovide services and work alongside humans. For those systems to be acceptedand trusted, the users should be able to understand the reasoning process ofthe system, i.e. the system should be transparent. System transparency enableshumans to form coherent explanations of the system's decisions and actions.Transparency is important not only for user trust, but also for softwaredebugging and certification. In recent years, Deep Neural Networks have madegreat advances in multiple application areas. However, deep neural networks areopaque. In this paper, we report on work in transparency in Deep ReinforcementLearning Networks (DRLN). Such networks have been extremely successful inaccurately learning action control in image input domains, such as Atari games.In this paper, we propose a novel and general method that (a) incorporatesexplicit object recognition processing into deep reinforcement learning models,(b) forms the basis for the development of "object saliency maps", to providevisualization of internal states of DRLNs, thus enabling the formation ofexplanations and (c) can be incorporated in any existing deep reinforcementlearning framework. We present computational results and human experiments toevaluate our approach.

Quick Read (beta)

loading the full paper ...