Abstract
Machine learning models are vulnerable to adversarial examples: small changesto images can cause computer vision models to make mistakes such as identifyinga school bus as an ostrich. However, it is still an open question whetherhumans are prone to similar mistakes. Here, we create the first adversarialexamples designed to fool humans, by leveraging recent techniques that transferadversarial examples from computer vision models with known parameters andarchitecture to other models with unknown parameters and architecture, and bymodifying models to more closely match the initial processing of the humanvisual system. We find that adversarial examples that strongly transfer acrosscomputer vision models influence the classifications made by time-limited humanobservers.