Boosting the Adversarial Transferability of Surrogate Models with Dark Knowledge

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial examples. And, theadversarial examples have transferability, which means that an adversarialexample for a DNN model can fool another model with a non-trivial probability.This gave birth to the transfer-based attack where the adversarial examplesgenerated by a surrogate model are used to conduct black-box attacks. There aresome work on generating the adversarial examples from a given surrogate modelwith better transferability. However, training a special surrogate model togenerate adversarial examples with better transferability is relativelyunder-explored. This paper proposes a method for training a surrogate modelwith dark knowledge to boost the transferability of the adversarial examplesgenerated by the surrogate model. This trained surrogate model is named darksurrogate model (DSM). The proposed method for training a DSM consists of twokey components: a teacher model extracting dark knowledge, and the mixingaugmentation skill enhancing dark knowledge of training data. We conductedextensive experiments to show that the proposed method can substantiallyimprove the adversarial transferability of surrogate models across differentarchitectures of surrogate models and optimizers for generating adversarialexamples, and it can be applied to other scenarios of transfer-based attackthat contain dark knowledge, like face verification. Our code is publiclyavailable at \url{https://github.com/ydc123/Dark_Surrogate_Model}.

Quick Read (beta)

loading the full paper ...