Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning

Abstract

While significant research advances have been made in the field of deepreinforcement learning, a major challenge to widespread industrial adoption ofdeep reinforcement learning that has recently surfaced but little explored isthe potential vulnerability to privacy breaches. In particular, there have beenno concrete adversarial attack strategies in literature tailored for studyingthe vulnerability of deep reinforcement learning algorithms to membershipinference attacks. To address this gap, we propose an adversarial attackframework tailored for testing the vulnerability of deep reinforcement learningalgorithms to membership inference attacks. More specifically, we design aseries of experiments to investigate the impact of temporal correlation, whichnaturally exists in reinforcement learning training data, on the probability ofinformation leakage. Furthermore, we study the differences in the performanceof \emph{collective} and \emph{individual} membership attacks against deepreinforcement learning algorithms. Experimental results show that the proposedadversarial attack framework is surprisingly effective at inferring the dataused during deep reinforcement training with an accuracy exceeding $84\%$ inindividual and $97\%$ in collective mode on two different control tasks inOpenAI Gym, which raises serious privacy concerns in the deployment of modelsresulting from deep reinforcement learning. Moreover, we show that the learningstate of a reinforcement learning algorithm significantly influences the levelof the privacy breach.

Quick Read (beta)

loading the full paper ...