Adversarial Attacks on Variational Autoencoders

Abstract

Adversarial attacks are malicious inputs that derail machine-learning models.We propose a scheme to attack autoencoders, as well as a quantitativeevaluation framework that correlates well with the qualitative assessment ofthe attacks. We assess --- with statistically validated experiments --- theresistance to attacks of three variational autoencoders (simple, convolutional,and DRAW) in three datasets (MNIST, SVHN, CelebA), showing that both DRAW'srecurrence and attention mechanism lead to better resistance. As autoencodersare proposed for compressing data --- a scenario in which their safety isparamount --- we expect more attention will be given to adversarial attacks onthem.

Quick Read (beta)

loading the full paper ...