Online Robust Policy Learning in the Presence of Unknown Adversaries

Abstract

The growing prospect of deep reinforcement learning (DRL) being used incyber-physical systems has raised concerns around safety and robustness ofautonomous agents. Recent work on generating adversarial attacks have shownthat it is computationally feasible for a bad actor to fool a DRL policy intobehaving sub optimally. Although certain adversarial attacks with specificattack models have been addressed, most studies are only interested in off-lineoptimization in the data space (e.g., example fitting, distillation). Thispaper introduces a Meta-Learned Advantage Hierarchy (MLAH) framework that isattack model-agnostic and more suited to reinforcement learning, via handlingthe attacks in the decision space (as opposed to data space) and directlymitigating learned bias introduced by the adversary. In MLAH, we learn separatesub-policies (nominal and adversarial) in an online manner, as guided by asupervisory master agent that detects the presence of the adversary byleveraging the advantage function for the sub-policies. We demonstrate that theproposed algorithm enables policy learning with significantly lower bias ascompared to the state-of-the-art policy learning approaches even in thepresence of heavy state information attacks. We present algorithm analysis andsimulation results using popular OpenAI Gym environments.

Quick Read (beta)

loading the full paper ...