Adversarial Reprogramming of Neural Networks

Abstract

Deep neural networks are susceptible to adversarial attacks. In computervision, well-crafted perturbations to images can cause neural networks to makemistakes such as identifying a panda as a gibbon or confusing a cat with acomputer. Previous adversarial examples have been designed to degradeperformance of models or cause machine learning models to produce specificoutputs chosen ahead of time by the attacker. We introduce adversarial attacksthat instead reprogram the target model to perform a task chosen by theattacker---without the attacker needing to specify or compute the desiredoutput for each test-time input. This attack is accomplished by optimizing fora single adversarial perturbation, of unrestricted magnitude, that can be addedto all test-time inputs to a machine learning model in order to cause the modelto perform a task chosen by the adversary when processing these inputs---evenif the model was not trained to do this task. These perturbations can be thusconsidered a program for the new task. We demonstrate adversarial reprogrammingon six ImageNet classification models, repurposing these models to perform acounting task, as well as two classification tasks: classification of MNIST andCIFAR-10 examples presented within the input to the ImageNet model.

Quick Read (beta)

loading the full paper ...