Automatic Model Monitoring for Data Streams

Abstract

Detecting concept drift is a well known problem that affects productionsystems. However, two important issues that are frequently not addressed in theliterature are 1) the detection of drift when the labels are not immediatelyavailable; and 2) the automatic generation of explanations to identify possiblecauses for the drift. For example, a fraud detection model in online paymentscould show a drift due to a hot sale item (with an increase in false positives)or due to a true fraud attack (with an increase in false negatives) beforelabels are available. In this paper we propose SAMM, an automatic modelmonitoring system for data streams. SAMM detects concept drift using a time andspace efficient unsupervised streaming algorithm and it generates alarm reportswith a summary of the events and features that are important to explain it.SAMM was evaluated in five real world fraud detection datasets, each spanningperiods up to eight months and totaling more than 22 million onlinetransactions. We evaluated SAMM using human feedback from domain experts, bysending them 100 reports generated by the system. Our results show that SAMM isable to detect anomalous events in a model life cycle that are considereduseful by the domain experts. Given these results, SAMM will be rolled out in anext version of Feedzai's Fraud Detection solution.

Quick Read (beta)

loading the full paper ...