Natural Adversarial Examples

Abstract

We introduce natural adversarial examples -- real-world, unmodified, andnaturally occurring examples that cause classifier accuracy to significantlydegrade. We curate 7,500 natural adversarial examples and release them in anImageNet classifier test set that we call ImageNet-A. This dataset serves as anew way to measure classifier robustness. Like l_p adversarial examples,ImageNet-A examples successfully transfer to unseen or black-box classifiers.For example, on ImageNet-A a DenseNet-121 obtains around 2% accuracy, anaccuracy drop of approximately 90%. Recovering this accuracy is not simplebecause ImageNet-A examples exploit deep flaws in current classifiers includingtheir over-reliance on color, texture, and background cues. We observe thatpopular training techniques for improving robustness have little effect, but weshow that some architectural changes can enhance robustness to naturaladversarial examples. Future research is required to enable robustgeneralization to this hard ImageNet test set.

Quick Read (beta)

loading the full paper ...