Reinforcement Unlearning

Abstract

Machine unlearning refers to the process of mitigating the influence ofspecific training data on machine learning models based on removal requestsfrom data owners. However, one important area that has been largely overlookedin the research of unlearning is reinforcement learning. Reinforcement learningfocuses on training an agent to make optimal decisions within an environment tomaximize its cumulative rewards. During the training, the agent tends tomemorize the features of the environment, which raises a significant concernabout privacy. As per data protection regulations, the owner of the environmentholds the right to revoke access to the agent's training data, thusnecessitating the development of a novel and pressing research field, known as\emph{reinforcement unlearning}. Reinforcement unlearning focuses on revokingentire environments rather than individual data samples. This uniquecharacteristic presents three distinct challenges: 1) how to propose unlearningschemes for environments; 2) how to avoid degrading the agent's performance inremaining environments; and 3) how to evaluate the effectiveness of unlearning.To tackle these challenges, we propose two reinforcement unlearning methods.The first method is based on decremental reinforcement learning, which aims toerase the agent's previously acquired knowledge gradually. The second methodleverages environment poisoning attacks, which encourage the agent to learnnew, albeit incorrect, knowledge to remove the unlearning environment.Particularly, to tackle the third challenge, we introduce the concept of``environment inference attack'' to evaluate the unlearning outcomes.

Quick Read (beta)

loading the full paper ...