TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning

Abstract

Deep reinforcement learning (DRL) has achieved remarkable success in a widerange of sequential decision-making domains, including robotics, healthcare,smart grids, and finance. Recent research demonstrates that attackers canefficiently exploit system vulnerabilities during the training phase to executebackdoor attacks, producing malicious actions when specific trigger patternsare present in the state observations. However, most existing backdoor attacksrely primarily on simplistic and heuristic trigger configurations, overlookingthe potential efficacy of trigger optimization. To address this gap, weintroduce TooBadRL (Trigger Optimization to Boost Effectiveness of BackdoorAttacks on DRL), the first framework to systematically optimize DRL backdoortriggers along three critical axes, i.e., temporal, spatial, and magnitude.Specifically, we first introduce a performance-aware adaptive freezingmechanism for injection timing. Then, we formulate dimension selection as acooperative game, utilizing Shapley value analysis to identify the mostinfluential state variable for the injection dimension. Furthermore, we proposea gradient-based adversarial procedure to optimize the injection magnitudeunder environment constraints. Evaluations on three mainstream DRL algorithmsand nine benchmark tasks show that TooBadRL significantly improves attacksuccess rates, while ensuring minimal degradation of normal task performance.These results highlight the previously underappreciated importance ofprincipled trigger optimization in DRL backdoor attacks. The source code ofTooBadRL can be found at https://github.com/S3IC-Lab/TooBadRL.

Quick Read (beta)

loading the full paper ...