Abstract
Foundation Language Models (FLMs) such as BERT and its variants have achievedremarkable success in natural language processing. To date, theinterpretability of FLMs has primarily relied on the attention weights in theirself-attention layers. However, these attention weights only provide word-levelinterpretations, failing to capture higher-level structures, and are thereforelacking in readability and intuitiveness. To address this challenge, we firstprovide a formal definition of conceptual interpretation and then propose avariational Bayesian framework, dubbed VAriational Language Concept (VALC), togo beyond word-level interpretations and provide concept-level interpretations.Our theoretical analysis shows that our VALC finds the optimal languageconcepts to interpret FLM predictions. Empirical results on several real-worlddatasets show that our method can successfully provide conceptualinterpretation for FLMs.