Efficient and Transferable Adversarial Examples from Bayesian Neural Networks

Abstract

An established way to improve the transferability of black-box evasionattacks is to craft the adversarial examples on a surrogate ensemble model toincrease diversity. We argue that transferability is fundamentally related toepistemic uncertainty. Based on a state-of-the-art Bayesian Deep Learningtechnique, we propose a new method to efficiently build a surrogate by samplingapproximately from the posterior distribution of neural network weights, whichrepresents the belief about the value of each parameter. Our extensiveexperiments on ImageNet and CIFAR-10 show that our approach improves thetransfer rates of four state-of-the-art attacks significantly (up to 62.1percentage points), in both intra-architecture and inter-architecture cases. OnImageNet, our approach can reach 94% of transfer rate while reducing trainingcomputations from 11.6 to 2.4 exaflops, compared to an ensemble ofindependently trained DNNs. Our vanilla surrogate achieves 87.5% of the timehigher transferability than 3 test-time techniques designed for this purpose.Our work demonstrates that the way to train a surrogate has been overlookedalthough it is an important element of transfer-based attacks. We are,therefore, the first to review the effectiveness of several training methods inincreasing transferability. We provide new directions to better understand thetransferability phenomenon and offer a simple but strong baseline for futurework.

Quick Read (beta)

loading the full paper ...