Differentially Private Reinforcement Learning with Linear Function Approximation

  • 2022-01-18 15:25:24
  • Xingyu Zhou
  • 1

Abstract

Motivated by the wide adoption of reinforcement learning (RL) in real-worldpersonalized services, where users' sensitive and private information needs tobe protected, we study regret minimization in finite-horizon Markov decisionprocesses (MDPs) under the constraints of differential privacy (DP). Comparedto existing private RL algorithms that work only on tabular finite-state,finite-actions MDPs, we take the first step towards privacy-preserving learningin MDPs with large state and action spaces. Specifically, we consider MDPs withlinear function approximation (in particular linear mixture MDPs) under thenotion of joint differential privacy (JDP), where the RL agent is responsiblefor protecting users' sensitive data. We design two private RL algorithms thatare based on value iteration and policy optimization, respectively, and showthat they enjoy sub-linear regret performance while guaranteeing privacyprotection. Moreover, the regret bounds are independent of the number ofstates, and scale at most logarithmically with the number of actions, makingthe algorithms suitable for privacy protection in nowadays large-scalepersonalized services. Our results are achieved via a general procedure forlearning in linear mixture MDPs under changing regularizers, which not onlygeneralizes previous results for non-private learning, but also serves as abuilding block for general private reinforcement learning.

 

Quick Read (beta)

loading the full paper ...