Hidden Trigger Backdoor Attacks

Abstract

With the success of deep learning algorithms in various domains, studyingadversarial attacks to secure deep models in real world applications has becomean important research topic. Backdoor attacks are a form of adversarial attackson deep networks where the attacker provides poisoned data to the victim totrain the model with, and then activates the attack by showing a specifictrigger pattern at the test time. Most state-of-the-art backdoor attacks eitherprovide mislabeled poisoning data that is possible to identify by visualinspection, reveal the trigger in the poisoned data, or use noise andperturbation to hide the trigger. We propose a novel form of backdoor attackwhere poisoned data look natural with correct labels and also more importantly,the attacker hides the trigger in the poisoned data and keeps the triggersecret until the test time. We perform an extensive study on various imageclassification settings and show that our attack can fool the model by pastingthe trigger at random locations on unseen images although the model performswell on clean data. We also show that our proposed attack cannot be easilydefended using a state-of-the-art defense algorithm for backdoor attacks.

Quick Read (beta)

loading the full paper ...