BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets

Abstract

Reinforcement learning (RL) makes an agent learn from trial-and-errorexperiences gathered during the interaction with the environment. Recently,offline RL has become a popular RL paradigm because it saves the interactionswith environments. In offline RL, data providers share large pre-collecteddatasets, and others can train high-quality agents without interacting with theenvironments. This paradigm has demonstrated effectiveness in critical taskslike robot control, autonomous driving, etc. However, less attention is paid toinvestigating the security threats to the offline RL system. This paper focuseson backdoor attacks, where some perturbations are added to the data(observations) such that given normal observations, the agent takeshigh-rewards actions, and low-reward actions on observations injected withtriggers. In this paper, we propose Baffle (Backdoor Attack for OfflineReinforcement Learning), an approach that automatically implants backdoors toRL agents by poisoning the offline RL dataset, and evaluate how differentoffline RL algorithms react to this attack. Our experiments conducted on fourtasks and four offline RL algorithms expose a disquieting fact: none of theexisting offline RL algorithms is immune to such a backdoor attack. Morespecifically, Baffle modifies 10\% of the datasets for four tasks (3 roboticcontrols and 1 autonomous driving). Agents trained on the poisoned datasetsperform well in normal settings. However, when triggers are presented, theagents' performance decreases drastically by 63.2\%, 53.9\%, 64.7\%, and 47.4\%in the four tasks on average. The backdoor still persists after fine-tuningpoisoned agents on clean datasets. We further show that the inserted backdooris also hard to be detected by a popular defensive method. This paper callsattention to developing more effective protection for the open-source offlineRL dataset.

Quick Read (beta)

loading the full paper ...