Adversarial Policies: Attacking Deep Reinforcement Learning

Abstract

Deep reinforcement learning (RL) policies are known to be vulnerable toadversarial perturbations to their observations, similar to adversarialexamples for classifiers. However, an attacker is not usually able to directlymodify another agent's observations. This might lead one to wonder: is itpossible to attack an RL agent simply by choosing an adversarial policy actingin a multi-agent environment so as to create natural observations that areadversarial? We demonstrate the existence of adversarial policies in zero-sumgames between simulated humanoid robots with proprioceptive observations,against state-of-the-art victims trained via self-play to be robust toopponents. The adversarial policies reliably win against the victims butgenerate seemingly random and uncoordinated behavior. We find that thesepolicies are more successful in high-dimensional environments, and inducesubstantially different activations in the victim policy network than when thevictim plays against a normal opponent. Videos are available athttps://adversarialpolicies.github.io/.

Quick Read (beta)

loading the full paper ...