Motivating the Rules of the Game for Adversarial Example Research

Abstract

Advances in machine learning have led to broad deployment of systems withimpressive performance on important problems. Nonetheless, these systems can beinduced to make errors on data that are surprisingly similar to examples thelearned system handles correctly. The existence of these errors raises avariety of questions about out-of-sample generalization and whether bad actorsmight use such examples to abuse deployed systems. As a result of thesesecurity concerns, there has been a flurry of recent papers proposingalgorithms to defend against such malicious perturbations of correctly handledexamples. It is unclear how such misclassifications represent a different kindof security problem than other errors, or even other attacker-produced examplesthat have no specific relationship to an uncorrupted input. In this paper, weargue that adversarial example defense papers have, to date, mostly consideredabstract, toy games that do not relate to any specific security concern.Furthermore, defense papers have not yet precisely described all the abilitiesand limitations of attackers that would be relevant in practical security.Towards this end, we establish a taxonomy of motivations, constraints, andabilities for more plausible adversaries. Finally, we provide a series ofrecommendations outlining a path forward for future work to more clearlyarticulate the threat model and perform more meaningful evaluation.

Quick Read (beta)

loading the full paper ...