Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness

Abstract

We introduce a Noise-based prior Learning (NoL) approach for training neuralnetworks that are intrinsically robust to adversarial attacks. We find that theimplicit generative modeling of random noise with the same loss function usedduring posterior maximization, improves a model's understanding of the datamanifold furthering adversarial robustness. We evaluate our approach's efficacyand provide a simplistic visualization tool for understanding adversarial data,using Principal Component Analysis. Our analysis reveals that adversarialrobustness, in general, manifests in models with higher variance along thehigh-ranked principal components. We show that models learnt with our approachperform remarkably well against a wide-range of attacks. Furthermore, combiningNoL with state-of-the-art adversarial training extends the robustness of amodel, even beyond what it is adversarially trained for, in both white-box andblack-box attack scenarios.

Quick Read (beta)

loading the full paper ...