Towards Explainable NLP: A Generative Explanation Framework for Text Classification

Abstract

Building explainable systems is a critical problem in the field of NaturalLanguage Processing (NLP), since most machine learning models provide noexplanations for the predictions. Existing approaches for explainable machinelearning systems tend to focus on interpreting the outputs or the connectionsbetween inputs and outputs. However, the fine-grained information is oftenignored, and the systems do not explicitly generate the human-readableexplanations. To better alleviate this problem, we propose a novel generativeexplanation framework that learns to make classification decisions and generatefine-grained explanations at the same time. More specifically, we introduce theexplainable factor and the minimum risk training approach that learn togenerate more reasonable explanations. We construct two new datasets thatcontain summaries, rating scores, and fine-grained reasons. We conductexperiments on both datasets, comparing with several strong neural networkbaseline systems. Experimental results show that our method surpasses allbaselines on both datasets, and is able to generate concise explanations at thesame time.

Quick Read (beta)

loading the full paper ...