The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks

Abstract

Deep Generative Models (DGMs) allow users to synthesize data from complex,high-dimensional manifolds. Industry applications of DGMs include dataaugmentation to boost performance of (semi-)supervised machine learning, or tomitigate fairness or privacy concerns. Large-scale DGMs are notoriously hard totrain, requiring expert skills, large amounts of data and extensivecomputational resources. Thus, it can be expected that many enterprises willresort to sourcing pre-trained DGMs from potentially unverified third parties,e.g.~open source model repositories. As we show in this paper, such a deployment scenario poses a new attacksurface, which allows adversaries to potentially undermine the integrity ofentire machine learning development pipelines in a victim organization.Specifically, we describe novel training-time attacks resulting in corruptedDGMs that synthesize regular data under normal operations and designated targetoutputs for inputs sampled from a trigger distribution. Depending on thecontrol that the adversary has over the random number generation, this imposesvarious degrees of risk that harmful data may enter the machine learningdevelopment pipelines, potentially causing material or reputational damage tothe victim organization. Our attacks are based on adversarial loss functions that combine the dualobjectives of attack stealth and fidelity. We show its effectiveness for avariety of DGM architectures (Generative Adversarial Networks (GANs),Variational Autoencoders (VAEs)) and data domains (images, audio). Ourexperiments show that - even for large-scale industry-grade DGMs - our attackcan be mounted with only modest computational efforts. We also investigate theeffectiveness of different defensive approaches (based on static/dynamic modeland output inspections) and prescribe a practical defense strategy that pavesthe way for safe usage of DGMs.

Quick Read (beta)

loading the full paper ...