LExT: Towards Evaluating Trustworthiness of Natural Language Explanations

Abstract

As Large Language Models (LLMs) become increasingly integrated intohigh-stakes domains, there have been several approaches proposed towardgenerating natural language explanations. These explanations are crucial forenhancing the interpretability of a model, especially in sensitive domains likehealthcare, where transparency and reliability are key. In light of suchexplanations being generated by LLMs and its known concerns, there is a growingneed for robust evaluation frameworks to assess model-generated explanations.Natural Language Generation metrics like BLEU and ROUGE capture syntactic andsemantic accuracies but overlook other crucial aspects such as factualaccuracy, consistency, and faithfulness. To address this gap, we propose ageneral framework for quantifying trustworthiness of natural languageexplanations, balancing Plausibility and Faithfulness, to derive acomprehensive Language Explanation Trustworthiness Score (LExT) (The code andset up to reproduce our experiments are publicly available athttps://github.com/cerai-iitm/LExT). Applying our domain-agnostic framework tothe healthcare domain using public medical datasets, we evaluate six models,including domain-specific and general-purpose models. Our findings demonstratesignificant differences in their ability to generate trustworthy explanations.On comparing these explanations, we make interesting observations such asinconsistencies in Faithfulness demonstrated by general-purpose models andtheir tendency to outperform domain-specific fine-tuned models. This workfurther highlights the importance of using a tailored evaluation framework toassess natural language explanations in sensitive fields, providing afoundation for improving the trustworthiness and transparency of languagemodels in healthcare and beyond.

Quick Read (beta)

loading the full paper ...