Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

Abstract

There has been a recent resurgence of interest in explainable artificialintelligence (XAI) that aims to reduce the opaqueness of AI-baseddecision-making systems, allowing humans to scrutinize and trust them. Priorwork in this context has focused on the attribution of responsibility for analgorithm's decisions to its inputs wherein responsibility is typicallyapproached as a purely associational concept. In this paper, we propose aprincipled causality-based approach for explaining black-box decision-makingsystems that addresses limitations of existing methods in XAI. At the core ofour framework lies probabilistic contrastive counterfactuals, a concept thatcan be traced back to philosophical, cognitive, and social foundations oftheories on how humans generate and select explanations. We show how suchcounterfactuals can quantify the direct and indirect influences of a variableon decisions made by an algorithm, and provide actionable recourse forindividuals negatively affected by the algorithm's decision. Unlike prior work,our system, LEWIS: (1)can compute provably effective explanations and recourseat local, global and contextual levels (2)is designed to work with users withvarying levels of background knowledge of the underlying causal model and(3)makes no assumptions about the internals of an algorithmic system except forthe availability of its input-output data. We empirically evaluate LEWIS onthree real-world datasets and show that it generates human-understandableexplanations that improve upon state-of-the-art approaches in XAI, includingthe popular LIME and SHAP. Experiments on synthetic data further demonstratethe correctness of LEWIS's explanations and the scalability of its recoursealgorithm.

Quick Read (beta)

loading the full paper ...