Abstract
Deep neural networks are becoming more and more popular due to theirrevolutionary success in diverse areas, such as computer vision, naturallanguage processing, and speech recognition. However, the decision-makingprocesses of these models are generally not interpretable to users. In variousdomains, such as healthcare, finance, or law, it is critical to know thereasons behind a decision made by an artificial intelligence system. Therefore,several directions for explaining neural models have recently been explored. In this thesis, I investigate two major directions for explaining deep neuralnetworks. The first direction consists of feature-based post-hoc explanatorymethods, that is, methods that aim to explain an already trained and fixedmodel (post-hoc), and that provide explanations in terms of input features,such as tokens for text and superpixels for images (feature-based). The seconddirection consists of self-explanatory neural models that generate naturallanguage explanations, that is, models that have a built-in module thatgenerates explanations for the predictions of the model.