Learning to Reject Low-Quality Explanations via User Feedback

  • 2025-07-18 09:14:45
  • Luca Stradiotti, Dario Pesenti, Stefano Teso, Jesse Davis
  • 0

Abstract

Machine Learning predictors are increasingly being employed in high-stakesapplications such as credit scoring. Explanations help users unpack the reasonsbehind their predictions, but are not always "high quality''. That is,end-users may have difficulty interpreting or believing them, which cancomplicate trust assessment and downstream decision-making. We argue thatclassifiers should have the option to refuse handling inputs whose predictionscannot be explained properly and introduce a framework for learning to rejectlow-quality explanations (LtX) in which predictors are equipped with a rejectorthat evaluates the quality of explanations. In this problem setting, the keychallenges are how to properly define and assess explanation quality and how todesign a suitable rejector. Focusing on popular attribution techniques, weintroduce ULER (User-centric Low-quality Explanation Rejector), which learns asimple rejector from human ratings and per-feature relevance judgments tomirror human judgments of explanation quality. Our experiments show that ULERoutperforms both state-of-the-art and explanation-aware learning to rejectstrategies at LtX on eight classification and regression benchmarks and on anew human-annotated dataset, which we will publicly release to support futureresearch.

 

Quick Read (beta)

loading the full paper ...