Abstract
While significant research advances have been made in the field of deepreinforcement learning, there have been no concrete adversarial attackstrategies in literature tailored for studying the vulnerability of deepreinforcement learning algorithms to membership inference attacks. In suchattacking systems, the adversary targets the set of collected input data onwhich the deep reinforcement learning algorithm has been trained. To addressthis gap, we propose an adversarial attack framework designed for testing thevulnerability of a state-of-the-art deep reinforcement learning algorithm to amembership inference attack. In particular, we design a series of experimentsto investigate the impact of temporal correlation, which naturally exists inreinforcement learning training data, on the probability of informationleakage. Moreover, we compare the performance of \emph{collective} and\emph{individual} membership attacks against the deep reinforcement learningalgorithm. Experimental results show that the proposed adversarial attackframework is surprisingly effective at inferring data with an accuracyexceeding $84\%$ in individual and $97\%$ in collective modes in threedifferent continuous control Mujoco tasks, which raises serious privacyconcerns in this regard. Finally, we show that the learning state of thereinforcement learning algorithm influences the level of privacy breachessignificantly.