Switching Gradient Directions for Query-Efficient Black-Box Adversarial Attacks

Abstract

We propose a simple and highly query-efficient black-box adversarial attacknamed SWITCH, which has a state-of-the-art performance under $\ell_2$ and$\ell_\infty$ norms in the score-based setting. In the black box attacksetting, designing query-efficient attacks remains an open problem. The highquery efficiency of the proposed approach stems from the combination oftransfer-based attacks and random-search-based ones. The surrogate model'sgradient $\hat{\mathbf{g}}$ is exploited for the guidance, which is thenswitched if our algorithm detects that it does not point to the adversarialregion by using a query, thereby keeping the objective loss function of thetarget model rising as much as possible. Two switch operations are available,i.e., SWITCH$_\text{neg}$ and SWITCH$_\text{rnd}$. SWITCH$_\text{neg}$ takes$-\hat{\mathbf{g}}$ as the new direction, which is reasonable under anapproximate local linearity assumption. SWITCH$_\text{rnd}$ computes thegradient from another model, which is randomly selected from a large model set,to help bypass the potential obstacle in optimization. Experimental resultsshow that these strategies boost the optimization process whereas following theoriginal surrogate gradients does not work. In SWITCH, no query is used toestimate the gradient, and all the queries aim to determine whether to switchdirections, resulting in unprecedented query efficiency. We demonstrate thatour approach outperforms 10 state-of-the-art attacks on CIFAR-10, CIFAR-100 andTinyImageNet datasets. SWITCH can serve as a strong baseline for futureblack-box attacks. The PyTorch source code is released inhttps://github.com/machanic/SWITCH .

Quick Read (beta)

loading the full paper ...