Unsupervised Model Selection for Time-series Anomaly Detection

Abstract

Anomaly detection in time-series has a wide range of practical applications.While numerous anomaly detection methods have been proposed in the literature,a recent survey concluded that no single method is the most accurate acrossvarious datasets. To make matters worse, anomaly labels are scarce and rarelyavailable in practice. The practical problem of selecting the most accuratemodel for a given dataset without labels has received little attention in theliterature. This paper answers this question i.e. Given an unlabeled datasetand a set of candidate anomaly detectors, how can we select the most accuratemodel? To this end, we identify three classes of surrogate (unsupervised)metrics, namely, prediction error, model centrality, and performance oninjected synthetic anomalies, and show that some metrics are highly correlatedwith standard supervised anomaly detection performance metrics such as the$F_1$ score, but to varying degrees. We formulate metric combination withmultiple imperfect surrogate metrics as a robust rank aggregation problem. Wethen provide theoretical justification behind the proposed approach.Large-scale experiments on multiple real-world datasets demonstrate that ourproposed unsupervised approach is as effective as selecting the most accuratemodel based on partially labeled data.

Quick Read (beta)

loading the full paper ...