WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

Abstract

Responsibly deploying artificial intelligence (AI) / machine learning (ML)systems in high-stakes settings arguably requires not only proof of systemreliability, but moreover continual, post-deployment monitoring to quicklydetect and address any unsafe behavior. Statistical methods for nonparametricchange-point detection -- especially the tools of conformal test martingales(CTMs) and anytime-valid inference -- offer promising approaches to thismonitoring task. However, existing methods are restricted to monitoring limitedhypothesis classes or ``alarm criteria'' (such as data shifts that violatecertain exchangeability assumptions), do not allow for online adaptation inresponse to shifts, and/or do not enable root-cause analysis of anydegradation. In this paper, we expand the scope of these monitoring methods byproposing a weighted generalization of conformal test martingales (WCTMs),which lay a theoretical foundation for online monitoring for any unexpectedchangepoints in the data distribution while controlling false-alarms. Forpractical applications, we propose specific WCTM algorithms that adapt onlineto mild covariate shifts (in the marginal input distribution) while quicklydetecting and diagnosing more severe shifts, such as concept shifts (in theconditional label distribution) or extreme (out-of-support) covariate shiftsthat cannot be easily adapted to. On real-world datasets, we demonstrateimproved performance relative to state-of-the-art baselines.

Quick Read (beta)

loading the full paper ...