MALT Powers Up Adversarial Attacks

Abstract

Current adversarial attacks for multi-class classifiers choose the targetclass for a given input naively, based on the classifier's confidence levelsfor various target classes. We present a novel adversarial targeting method,\textit{MALT - Mesoscopic Almost Linearity Targeting}, based on medium-scalealmost linearity assumptions. Our attack wins over the current state of the artAutoAttack on the standard benchmark datasets CIFAR-100 and ImageNet and for avariety of robust models. In particular, our attack is \emph{five times faster}than AutoAttack, while successfully matching all of AutoAttack's successes andattacking additional samples that were previously out of reach. We then proveformally and demonstrate empirically that our targeting method, althoughinspired by linear predictors, also applies to standard non-linear models.

Quick Read (beta)

loading the full paper ...