Abstract
Deep Reinforcement Learning (DRL) policies are highly susceptible toadversarial noise in observations, which poses significant risks insafety-critical scenarios. For instance, a self-driving car could experiencecatastrophic consequences if its sensory inputs about traffic signs aremanipulated by an adversary. The core challenge in such situations is that thetrue state of the environment becomes only partially observable due to theseadversarial manipulations. Two key strategies have so far been employed in theliterature; the first set of methods focuses on increasing the likelihood thatnearby states--those close to the true state--share the same robust actions.The second set of approaches maximize the value for the worst possible truestate within the range of adversarially perturbed observations. Although theseapproaches provide strong robustness against attacks, they tend to be eitheroverly conservative or not generalizable. We hypothesize that the shortcomingsof these approaches stem from their failure to explicitly account for partialobservability. By making decisions that directly consider this partialknowledge of the true state, we believe it is possible to achieve a betterbalance between robustness and performance, particularly in adversarialsettings. To achieve this, we introduce a novel objective called AdversarialCounterfactual Error (ACoE), which is defined on the beliefs about theunderlying true state and naturally balances value optimization with robustnessagainst adversarial attacks, and a theoretically-grounded, scalable surrogateobjective Cumulative-ACoE (C-ACoE). Our empirical evaluations demonstrate thatour method significantly outperforms current state-of-the-art approaches foraddressing adversarial RL challenges, offering a promising direction for betterDRL under adversarial conditions.