Big but Imperceptible Adversarial Perturbations via Semantic Manipulation

Abstract

Machine learning, especially deep learning, is widely applied to a range ofapplications including computer vision, robotics and natural languageprocessing. However, it has been shown that machine learning models arevulnerable to adversarial examples, carefully crafted samples that deceivelearning models. In-depth studies on adversarial examples can help betterunderstand potential vulnerabilities and therefore improve model robustness.Recent works have introduced various methods which generate adversarialexamples. However, all require the perturbation to be of small magnitude($\mathcal{L}_p$ norm) for them to be imperceptible to humans, which is hard todeploy in practice. In this paper we propose two novel methods, tAdv and cAdv,which leverage texture transfer and colorization to generate naturalperturbation with a large $\mathcal{L}_p$ norm. We conduct extensiveexperiments to show that the proposed methods are general enough to attack bothimage classification and image captioning tasks on ImageNet and MSCOCO dataset.In addition, we conduct comprehensive user studies under various conditions toshow that our generated adversarial examples are imperceptible to humans evenwhen the perturbations are large. We also evaluate the transferability androbustness of the proposed attacks against several state-of-the-art defenses.

Quick Read (beta)

loading the full paper ...