Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders

Abstract

Susceptibility of deep neural networks to adversarial attacks poses a majortheoretical and practical challenge. All efforts to harden classifiers againstsuch attacks have seen limited success. Two distinct categories of samples towhich deep networks are vulnerable, "adversarial samples" and "foolingsamples", have been tackled separately so far due to the difficulty posed whenconsidered together. In this work, we show how one can address them both underone unified framework. We tie a discriminative model with a generative model,rendering the adversarial objective to entail a conflict. Our model has theform of a variational autoencoder, with a Gaussian mixture prior on the latentvector. Each mixture component of the prior distribution corresponds to one ofthe classes in the data. This enables us to perform selective classification,leading to the rejection of adversarial samples instead of misclassification.Our method inherently provides a way of learning a selective classifier in asemi-supervised scenario as well, which can resist adversarial attacks. We alsoshow how one can reclassify the rejected adversarial samples.

Quick Read (beta)

loading the full paper ...