Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

Abstract

Data poisoning for reinforcement learning has historically focused on generalperformance degradation, and targeted attacks have been successful viaperturbations that involve control of the victim's policy and rewards. Weintroduce an insidious poisoning attack for reinforcement learning which causesagent misbehavior only at specific target states - all while minimallymodifying a small fraction of training observations without assuming anycontrol over policy or reward. We accomplish this by adapting a recenttechnique, gradient alignment, to reinforcement learning. We test our methodand demonstrate success in two Atari games of varying difficulty.

Quick Read (beta)

loading the full paper ...