Query-based Targeted Action-Space Adversarial Policies on Deep Reinforcement Learning Agents

Abstract

Advances in computing resources have resulted in the increasing complexity ofcyber-physical systems (CPS). As the complexity of CPS evolved, the focus hasshifted from traditional control methods to deep reinforcement learning-based(DRL) methods for control of these systems. This is due to the difficulty ofobtaining accurate models of complex CPS for traditional control. However, tosecurely deploy DRL in production, it is essential to examine the weaknesses ofDRL-based controllers (policies) towards malicious attacks from all angles. Inthis work, we investigate targeted attacks in the action-space domain, alsocommonly known as actuation attacks in CPS literature, which perturbs theoutputs of a controller. We show that a query-based black-box attack model thatgenerates optimal perturbations with respect to an adversarial goal can beformulated as another reinforcement learning problem. Thus, such an adversarialpolicy can be trained using conventional DRL methods. Experimental resultsshowed that adversarial policies that only observe the nominal policy's outputgenerate stronger attacks than adversarial policies that observe the nominalpolicy's input and output. Further analysis reveals that nominal policies whoseoutputs are frequently at the boundaries of the action space are naturally morerobust towards adversarial policies. Lastly, we propose the use of adversarialtraining with transfer learning to induce robust behaviors into the nominalpolicy, which decreases the rate of successful targeted attacks by half.

Quick Read (beta)

loading the full paper ...