A Data-driven Adversarial Examples Recognition Framework via Adversarial Feature Genome

Abstract

Adversarial examples pose many security threats to convolutional neuralnetworks (CNNs). Most defense algorithms prevent these threats by findingdifferences between the original images and adversarial examples. However, thefound differences do not contain features about the classes, so these defensealgorithms can only detect adversarial examples without recovering the correctlabels. In this regard, we propose the Adversarial Feature Genome (AFG), anovel type of data that contains both the differences and features aboutclasses. This method is inspired by an observed phenomenon, namely theAdversarial Feature Separability (AFS), where the difference between thefeature maps of the original images and adversarial examples becomes largerwith deeper layers. On top of that, we further develop an adversarial examplerecognition framework that detects adversarial examples and can recover thecorrect labels. In the experiments, the detection and classification ofadversarial examples by AFGs has an accuracy of more than 90.01\% in variousattack scenarios. To the best of our knowledge, our method is the first methodthat focuses on both attack detecting and recovering. AFG gives a newdata-driven perspective to improve the robustness of CNNs. The source code isavailable at https://github.com/GeoX-Lab/Adv_Fea_Genome.

Quick Read (beta)

loading the full paper ...