Multiple Outlier Testing with Conformal p-values

Abstract

This paper studies the construction of p-values for nonparametric outlierdetection, taking a multiple-testing perspective. The goal is to test whethernew independent samples belong to the same distribution as a reference data setor are outliers. We propose a solution based on conformal inference, a broadlyapplicable framework which yields p-values that are marginally valid butmutually dependent for different test points. We prove these p-values arepositively dependent and enable exact false discovery rate control, although ina relatively weak marginal sense. We then introduce a new method to computep-values that are both valid conditionally on the training data and independentof each other for different test points; this paves the way to stronger type-Ierror guarantees. Our results depart from classical conformal inference as weleverage concentration inequalities rather than combinatorial arguments toestablish our finite-sample guarantees. Furthermore, our techniques also yielda uniform confidence bound for the false positive rate of any outlier detectionalgorithm, as a function of the threshold applied to its raw statistics.Finally, the relevance of our results is demonstrated by numerical experimentson real and simulated data.

Quick Read (beta)

loading the full paper ...