Improving Transferability of Adversarial Examples with Input Diversity

Abstract

Though convolutional neural networks have achieved state-of-the-artperformance on various vision tasks, they are extremely vulnerable toadversarial examples, which are obtained by adding human-imperceptibleperturbations to the original images. Adversarial examples can thus be used asan useful tool to evaluate and select the most robust models in safety-criticalapplications. However, most of the existing adversarial attacks only achieverelatively low success rates under the challenging black-box setting, where theattackers have no knowledge of the model structure and parameters. To this end,we propose to improve the transferability of adversarial examples by creatingdiverse input patterns. Instead of only using the original images to generateadversarial examples, our method applies random transformations to the inputimages at each iteration. Extensive experiments on ImageNet show that theproposed attack method can generate adversarial examples that transfer muchbetter to different networks than existing baselines. To further improve thetransferability, we (1) integrate the recently proposed momentum method intothe attack process; and (2) attack an ensemble of networks simultaneously. Byevaluating our method against top defense submissions and official baselinesfrom NIPS 2017 adversarial competition, this enhanced attack reaches an averagesuccess rate of 73.0%, which outperforms the top 1 attack submission in theNIPS competition by a large margin of 6.6%. We hope that our proposed attackstrategy can serve as a benchmark for evaluating the robustness of networks toadversaries and the effectiveness of different defense methods in future. Thecode is public available at https://github.com/cihangxie/DI-2-FGSM.

Quick Read (beta)

loading the full paper ...