Adversarial Example Games

Abstract

The existence of adversarial examples capable of fooling trained neuralnetwork classifiers calls for a much better understanding of possible attacks,in order to guide the development of safeguards against them. It includesattack methods in the highly challenging non-interactive blackbox setting,where adversarial attacks are generated without any access, including queries,to the target model. Prior works in this setting have relied mainly onalgorithmic innovations derived from empirical observations (e.g., thatmomentum helps), and the field currently lacks a firm theoretical basis forunderstanding transferability in adversarial attacks. In this work, we addressthis gap and lay the theoretical foundations for crafting transferableadversarial examples to entire function classes. We introduce AdversarialExamples Games (AEG), a novel framework that models adversarial examples astwo-player min-max games between an attack generator and a representativeclassifier. We prove that the saddle point of an AEG game corresponds to agenerating distribution of adversarial examples against entire functionclasses. Training the generator only requires the ability to optimize arepresentative classifier from a given hypothesis class, enabling BlackBoxtransfer to unseen classifiers from the same class. We demonstrate the efficacyof our approach on the MNIST and CIFAR-10 datasets against both undefended androbustified models, achieving competitive performance with state-of-the-artBlackBox transfer approaches.

Quick Read (beta)

loading the full paper ...