Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models

Abstract

Concept-based methods have emerged as a promising direction to developinterpretable neural networks in standard supervised settings. However, mostworks that study them in incremental settings assume either a static conceptset across all experiences or assume that each experience relies on a distinctset of concepts. In this work, we study concept-based models in a morerealistic, dynamic setting where new classes may rely on older concepts inaddition to introducing new concepts themselves. We show that concepts andclasses form a complex web of relationships, which is susceptible todegradation and needs to be preserved and augmented across experiences. Weintroduce new metrics to show that existing concept-based models cannotpreserve these relationships even when trained using methods to preventcatastrophic forgetting, since they cannot handle forgetting at concept, class,and concept-class relationship levels simultaneously. To address these issues,we propose a novel method - MuCIL - that uses multimodal concepts to performclassification without increasing the number of trainable parameters acrossexperiences. The multimodal concepts are aligned to concepts provided innatural language, making them interpretable by design. Through extensiveexperimentation, we show that our approach obtains state-of-the-artclassification performance compared to other concept-based models, achievingover 2$\times$ the classification performance in some cases. We also study theability of our model to perform interventions on concepts, and show that it canlocalize visual concepts in input images, providing post-hoc interpretations.

Quick Read (beta)

loading the full paper ...