Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity Analysis

Abstract

Meta reinforcement learning (meta RL), as a combination of meta-learningideas and reinforcement learning (RL), enables the agent to adapt to differenttasks using a few samples. However, this sampling-based adaptation also makesmeta RL vulnerable to adversarial attacks. By manipulating the reward feedbackfrom sampling processes in meta RL, an attacker can mislead the agent intobuilding wrong knowledge from training experience, which deteriorates theagent's performance when dealing with different tasks after adaptation. Thispaper provides a game-theoretical underpinning for understanding this type ofsecurity risk. In particular, we formally define the sampling attack model as aStackelberg game between the attacker and the agent, which yields a minimaxformulation. It leads to two online attack schemes: Intermittent Attack andPersistent Attack, which enable the attacker to learn an optimal samplingattack, defined by an $\epsilon$-first-order stationary point, within$\mathcal{O}(\epsilon^{-2})$ iterations. These attack schemes freeride thelearning progress concurrently without extra interactions with the environment.By corroborating the convergence results with numerical experiments, we observethat a minor effort of the attacker can significantly deteriorate the learningperformance, and the minimax approach can also help robustify the meta RLalgorithms.

Quick Read (beta)

loading the full paper ...