Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification

Abstract

Backdoor injection attacks are a threat to machine learning models that aretrained on large data collected from untrusted sources; these attacks enableattackers to inject malicious behavior into the model that can be triggered byspecially crafted inputs. Prior work has established bounds on the success ofbackdoor attacks and their impact on the benign learning task, however, an openquestion is what amount of poison data is needed for a successful backdoorattack. Typical attacks either use few samples, but need much information aboutthe data points or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with onepoison sample and limited background knowledge can inject a backdoor with zerobackdooring-error and without significantly impacting the benign learning taskperformance. Moreover, we prove the one-poison hypothesis for linear regressionand linear classification. For adversaries that utilize a direction that isunused by the benign data distribution for the poison sample, we show that theresulting model is functionally equivalent to a model where the poison wasexcluded from training. We build on prior work on statistical backdoor learningto show that in all other cases, the impact on the benign learning task isstill limited. We also validate our theoretical results experimentally withrealistic benchmark data sets.

Quick Read (beta)

loading the full paper ...