Abstract
This paper studies the construction of p-values for nonparametric outlierdetection, taking a multiple-testing perspective. The goal is to test whethernew independent samples belong to the same distribution as a reference data setor are outliers. We propose a solution based on conformal inference, a broadlyapplicable framework which yields p-values that are marginally valid butmutually dependent for different test points. We prove these p-values arepositively dependent and enable exact false discovery rate control, although ina relatively weak marginal sense. We then introduce a new method to computep-values that are both valid conditionally on the training data and independentof each other for different test points; this paves the way to stronger type-Ierror guarantees. Our results depart from classical conformal inference as weleverage concentration inequalities rather than combinatorial arguments toestablish our finite-sample guarantees. Furthermore, our techniques also yielda uniform confidence bound for the false positive rate of any outlier detectionalgorithm, as a function of the threshold applied to its raw statistics.Finally, the relevance of our results is demonstrated by numerical experimentson real and simulated data.