A principled approach for generating adversarial images under non-smooth dissimilarity metrics

Abstract

Deep neural networks perform well on real world data but are prone toadversarial perturbations: small changes in the input easily lead tomisclassification. In this work, we propose an attack methodology not only forcases where the perturbations are measured by $\ell_p$ norms, but in fact anyadversarial dissimilarity metric with a closed proximal form. This includes,but is not limited to, $\ell_1, \ell_2$, and $\ell_\infty$ perturbations; the$\ell_0$ counting "norm" (i.e. true sparseness); and the total variationseminorm, which is a (non-$\ell_p$) convolutional dissimilarity measuring localpixel changes. Our approach is a natural extension of a recent adversarialattack method, and eliminates the differentiability requirement of the metric.We demonstrate our algorithm, ProxLogBarrier, on the MNIST, CIFAR10, andImageNet-1k datasets. We consider undefended and defended models, and show thatour algorithm easily transfers to various datasets. We observe thatProxLogBarrier outperforms a host of modern adversarial attacks specialized forthe $\ell_0$ case. Moreover, by altering images in the total variationseminorm, we shed light on a new class of perturbations that exploitneighboring pixel information.

Quick Read (beta)

loading the full paper ...