HINT: Healthy Influential-Noise based Training to Defend against Data Poisoning Attacks

Abstract

While numerous defense methods have been proposed to prohibit potentialpoisoning attacks from untrusted data sources, most research works only defendagainst specific attacks, which leaves many avenues for an adversary toexploit. In this work, we propose an efficient and robust training approach todefend against data poisoning attacks based on influence functions, namedHealthy Influential-Noise based Training. Using influence functions, we crafthealthy noise that helps to harden the classification model against poisoningattacks without significantly affecting the generalization ability on testdata. In addition, our method can perform effectively when only a subset of thetraining data is modified, instead of the current method of adding noise to allexamples that has been used in several previous works. We conduct comprehensiveevaluations over two image datasets with state-of-the-art poisoning attacksunder different realistic attack scenarios. Our empirical results show thatHINT can efficiently protect deep learning models against the effect of bothuntargeted and targeted poisoning attacks.

Quick Read (beta)

loading the full paper ...