Probabilistic Perspectives on Error Minimization in Adversarial Reinforcement Learning

Abstract

Deep Reinforcement Learning (DRL) policies are critically vulnerable toadversarial noise in observations, posing severe risks in safety-criticalscenarios. For example, a self-driving car receiving manipulated sensory inputsabout traffic signs could lead to catastrophic outcomes. Existing strategies tofortify RL algorithms against such adversarial perturbations generally fallinto two categories: (a) using regularization methods that enhance robustnessby incorporating adversarial loss terms into the value objectives, and (b)adopting "maximin" principles, which focus on maximizing the minimum value toensure robustness. While regularization methods reduce the likelihood ofsuccessful attacks, their effectiveness drops significantly if an attack doessucceed. On the other hand, maximin objectives, although robust, tend to beoverly conservative. To address this challenge, we introduce a novel objectivecalled Adversarial Counterfactual Error (ACoE), which naturally balancesoptimizing value and robustness against adversarial attacks. To optimize ACoEin a scalable manner in model-free settings, we propose a theoreticallyjustified surrogate objective known as Cumulative-ACoE (C-ACoE). The core ideaof optimizing C-ACoE is utilizing the belief about the underlying true stategiven the adversarially perturbed observation. Our empirical evaluationsdemonstrate that our method outperforms current state-of-the-art approaches foraddressing adversarial RL problems across all established benchmarks (MuJoCo,Atari, and Highway) used in the literature.

Quick Read (beta)

loading the full paper ...