Optimal Attacks on Reinforcement Learning Policies

Abstract

Control policies, trained using the Deep Reinforcement Learning, have beenrecently shown to be vulnerable to adversarial attacks introducing even verysmall perturbations to the policy input. The attacks proposed so far have beendesigned using heuristics, and build on existing adversarial example craftingtechniques used to dupe classifiers in supervised learning. In contrast, thispaper investigates the problem of devising optimal attacks, depending on awell-defined attacker's objective, e.g., to minimize the main agent averagereward. When the policy and the system dynamics, as well as rewards, are knownto the attacker, a scenario referred to as a white-box attack, designingoptimal attacks amounts to solving a Markov Decision Process. For what we callblack-box attacks, where neither the policy nor the system is known, optimalattacks can be trained using Reinforcement Learning techniques. Throughnumerical experiments, we demonstrate the efficiency of our attacks compared toexisting attacks (usually based on Gradient methods). We further quantify thepotential impact of attacks and establish its connection to the smoothness ofthe policy under attack. Smooth policies are naturally less prone to attacks(this explains why Lipschitz policies, with respect to the state, are moreresilient). Finally, we show that from the main agent perspective, the systemuncertainties and the attacker can be modeled as a Partially Observable MarkovDecision Process. We actually demonstrate that using Reinforcement Learningtechniques tailored to POMDP (e.g. using Recurrent Neural Networks) leads tomore resilient policies.

Quick Read (beta)

loading the full paper ...