On the Nature and Types of Anomalies: A Review

Abstract

Anomalies are occurrences in a dataset that are in some way unusual and donot fit the general patterns. The concept of the anomaly is generallyill-defined and perceived as vague and domain-dependent. Moreover, nocomprehensive and concrete overviews of the different types of anomalies havehitherto been published. By means of an extensive literature review this studytherefore offers the first theoretically principled and domain-independenttypology of data anomalies, and presents a full overview of anomaly types andsubtypes. To concretely define the concept of the anomaly and its differentmanifestations the typology employs four dimensions: data type, cardinality ofrelationship, data structure and data distribution. These fundamental anddata-centric dimensions naturally yield 3 broad groups, 9 basic types and 61subtypes of anomalies. The typology facilitates the evaluation of thefunctional capabilities of anomaly detection algorithms, contributes toexplainable data science, and provides insights into relevant topics such aslocal versus global anomalies.

Quick Read (beta)

loading the full paper ...