Regret-Based Defense in Adversarial Reinforcement Learning

Abstract

Deep Reinforcement Learning (DRL) policies have been shown to be vulnerableto small adversarial noise in observations. Such adversarial noise can havedisastrous consequences in safety-critical environments. For instance, aself-driving car receiving adversarially perturbed sensory observations aboutnearby signs (e.g., a stop sign physically altered to be perceived as a speedlimit sign) or objects (e.g., cars altered to be recognized as trees) can befatal. Existing approaches for making RL algorithms robust to anobservation-perturbing adversary have focused on reactive approaches thatiteratively improve against adversarial examples generated at each iteration.While such approaches have been shown to provide improvements over regular RLmethods, they are reactive and can fare significantly worse if certaincategories of adversarial examples are not generated during training. To thatend, we pursue a more proactive approach that relies on directly optimizing awell-studied robustness measure, regret instead of expected value. We provide aprincipled approach that minimizes maximum regret over a "neighborhood" ofobservations to the received "observation". Our regret criterion can be used tomodify existing value- and policy-based Deep RL methods. We demonstrate thatour approaches provide a significant improvement in performance across a widevariety of benchmarks against leading approaches for robust Deep RL.

Quick Read (beta)

loading the full paper ...