Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks

Abstract

With the wide deployment of machine learning (ML) based systems for a varietyof applications including medical, military, automotive, genomic, as well asmultimedia and social networking, there is great potential for damage fromadversarial learning (AL) attacks. In this paper, we provide a contemporarysurvey of AL, focused particularly on defenses against attacks on statisticalclassifiers. After introducing relevant terminology and the goals and range ofpossible knowledge of both attackers and defenders, we survey recent work ontest-time evasion (TTE), data poisoning (DP), and reverse engineering (RE)attacks and particularly defenses against same. In so doing, we distinguishrobust classification from anomaly detection (AD), unsupervised fromsupervised, and statistical hypothesis-based defenses from ones that do nothave an explicit null (no attack) hypothesis; we identify the hyperparameters aparticular method requires, its computational complexity, as well as theperformance measures on which it was evaluated and the obtained quality. Wethen dig deeper, providing novel insights that challenge conventional AL wisdomand that target unresolved issues, including: 1) robust classification versusAD as a defense strategy; 2) the belief that attack success increases withattack strength, which ignores susceptibility to AD; 3) small perturbations fortest-time evasion attacks: a fallacy or a requirement?; 4) validity of theuniversal assumption that a TTE attacker knows the ground-truth class for theexample to be attacked; 5) black, grey, or white box attacks as the standardfor defense evaluation; 6) susceptibility of query-based RE to an AD defense.We then present benchmark comparisons of several defenses against TTE, RE, andbackdoor DP attacks on images. The paper concludes with a discussion of futurework.

Quick Read (beta)

loading the full paper ...