Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

Abstract

Many machine learning algorithms are vulnerable to almost imperceptibleperturbations of their inputs. So far it was unclear how much risk adversarialperturbations carry for the safety of real-world machine learning applicationsbecause most methods used to generate such perturbations rely either ondetailed model information (gradient-based attacks) or on confidence scoressuch as class probabilities (score-based attacks), neither of which areavailable in most real-world scenarios. In many such cases one currently needsto retreat to transfer-based attacks which rely on cumbersome substitutemodels, need access to the training data and can be defended against. Here weemphasise the importance of attacks which solely rely on the final modeldecision. Such decision-based attacks are (1) applicable to real-worldblack-box models such as autonomous cars, (2) need less knowledge and areeasier to apply than transfer-based attacks and (3) are more robust to simpledefences than gradient- or score-based attacks. Previous attacks in thiscategory were limited to simple models or simple datasets. Here we introducethe Boundary Attack, a decision-based attack that starts from a largeadversarial perturbation and then seeks to reduce the perturbation whilestaying adversarial. The attack is conceptually simple, requires close to nohyperparameter tuning, does not rely on substitute models and is competitivewith the best gradient-based attacks in standard computer vision tasks likeImageNet. We apply the attack on two black-box algorithms from Clarifai.com.The Boundary Attack in particular and the class of decision-based attacks ingeneral open new avenues to study the robustness of machine learning models andraise new questions regarding the safety of deployed machine learning systems.An implementation of the attack is available as part of Foolbox athttps://github.com/bethgelab/foolbox .

Quick Read (beta)

loading the full paper ...