Generating adversarial examples with adversarial networks

Abstract

Deep neural networks (DNNs) have been found to be vulnerable to adversarialexamples resulting from adding small-magnitude perturbations to inputs. Suchadversarial examples can mislead DNNs to produce adversary-selected results.Different attack strategies have been proposed to generate adversarialexamples, but how to produce them with high perceptual quality and moreefficiently requires more research efforts. In this paper, we propose AdvGAN togenerate adversarial examples with generative adversarial networks (GANs),which can learn and approximate the distribution of original instances. ForAdvGAN, once the generator is trained, it can generate adversarialperturbations efficiently for any instance, so as to potentially accelerateadversarial training as defenses. We apply AdvGAN in both semi-whitebox andblack-box attack settings. In semi-whitebox attacks, there is no need to accessthe original target model after the generator is trained, in contrast totraditional white-box attacks. In black-box attacks, we dynamically train adistilled model for the black-box model and optimize the generator accordingly.Adversarial examples generated by AdvGAN on different target models have highattack success rate under state-of-the-art defenses compared to other attacks.Our attack has placed the first with 92.76% accuracy on a public MNISTblack-box attack challenge.

Quick Read (beta)

loading the full paper ...