Explaining Deep Neural Networks

Abstract

Deep neural networks are becoming more and more popular due to theirrevolutionary success in diverse areas, such as computer vision, naturallanguage processing, and speech recognition. However, the decision-makingprocesses of these models are generally not interpretable to users. In variousdomains, such as healthcare, finance, or law, it is critical to know thereasons behind a decision made by an artificial intelligence system. Therefore,several directions for explaining neural models have recently been explored. In this thesis, I investigate two major directions for explaining deep neuralnetworks. The first direction consists of feature-based post-hoc explanatorymethods, that is, methods that aim to explain an already trained and fixedmodel (post-hoc), and that provide explanations in terms of input features,such as tokens for text and superpixels for images (feature-based). The seconddirection consists of self-explanatory neural models that generate naturallanguage explanations, that is, models that have a built-in module thatgenerates explanations for the predictions of the model.

Quick Read (beta)

loading the full paper ...