EAD: an ensemble approach to detect adversarial examples from the hidden features of deep neural networks

Abstract

One of the key challenges in Deep Learning is the definition of effectivestrategies for the detection of adversarial examples. To this end, we propose anovel approach named Ensemble Adversarial Detector (EAD) for the identificationof adversarial examples, in a standard multiclass classification scenario. EADcombines multiple detectors that exploit distinct properties of the inputinstances in the internal representation of a pre-trained Deep Neural Network(DNN). Specifically, EAD integrates the state-of-the-art detectors based onMahalanobis distance and on Local Intrinsic Dimensionality (LID) with a newlyintroduced method based on One-class Support Vector Machines (OSVMs). Althoughall constituting methods assume that the greater the distance of a testinstance from the set of correctly classified training instances, the higherits probability to be an adversarial example, they differ in the way suchdistance is computed. In order to exploit the effectiveness of the differentmethods in capturing distinct properties of data distributions and,accordingly, efficiently tackle the trade-off between generalization andoverfitting, EAD employs detector-specific distance scores as features of alogistic regression classifier, after independent hyperparameters optimization.We evaluated the EAD approach on distinct datasets (CIFAR-10, CIFAR-100 andSVHN) and models (ResNet and DenseNet) and with regard to four adversarialattacks (FGSM, BIM, DeepFool and CW), also by comparing with competingapproaches. Overall, we show that EAD achieves the best AUROC and AUPR in thelarge majority of the settings and comparable performance in the others. Theimprovement over the state-of-the-art, and the possibility to easily extend EADto include any arbitrary set of detectors, pave the way to a widespreadadoption of ensemble approaches in the broad field of adversarial exampledetection.

Quick Read (beta)

loading the full paper ...