Adversarial Spheres

  • 2018-01-09 03:24:53
  • Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow
  • 24


State of the art computer vision models have been shown to be vulnerable tosmall adversarial perturbations of the input. In other words, most images inthe data distribution are both correctly classified by the model and are veryclose to a visually similar misclassified image. Despite substantial researchinterest, the cause of the phenomenon is still poorly understood and remainsunsolved. We hypothesize that this counter intuitive behavior is a naturallyoccurring result of the high dimensional geometry of the data manifold. As afirst step towards exploring this hypothesis, we study a simple syntheticdataset of classifying between two concentric high dimensional spheres. Forthis dataset we show a fundamental tradeoff between the amount of test errorand the average distance to nearest error. In particular, we prove that anymodel which misclassifies a small constant fraction of a sphere will bevulnerable to adversarial perturbations of size $O(1/\sqrt{d})$. Surprisingly,when we train several different architectures on this dataset, all of theirerror sets naturally approach this theoretical bound. As a result of thetheory, the vulnerability of neural networks to small adversarial perturbationsis a logical consequence of the amount of test error observed. We hope that ourtheoretical analysis of this very simple case will point the way forward toexplore how the geometry of complex real-world data sets leads to adversarialexamples.


Introduction (beta)



Conclusion (beta)