With Friends Like These, Who Needs Adversaries?

Abstract

The vulnerability of deep image classification networks to adversarial attackis now well known, but less well understood. Via a novel experimental analysis,we illustrate some facts about deep convolutional networks (DCNs) that shed newlight on their behaviour and its connection to the problem of adversaries, withtwo key results. The first is a straightforward explanation of the existence ofuniversal adversarial perturbations and their association with specific classidentities, obtained by analysing the properties of nets' logit responses asfunctions of 1D movements along specific image-space directions. The second isthe clear demonstration of the tight coupling between classificationperformance and vulnerability to adversarial attack within the spaces spannedby these directions. Prior work has noted the importance of low-dimensionalsubspaces in adversarial vulnerability: we illustrate that this likewiserepresents the nets' notion of saliency. In all, we provide a digestibleperspective from which to understand previously reported results which haveappeared disjoint or contradictory, with implications for efforts to constructneural nets that are both accurate and robust to adversarial attack.

Quick Read (beta)

loading the full paper ...