Local Differentially Private Regret Minimization in Reinforcement Learning

Abstract

Reinforcement learning algorithms are widely used in domains where it isdesirable to provide a personalized service. In these domains it is common thatuser data contains sensitive information that needs to be protected from thirdparties. Motivated by this, we study privacy in the context of finite-horizonMarkov Decision Processes (MDPs) by requiring information to be obfuscated onthe user side. We formulate this notion of privacy for RL by leveraging thelocal differential privacy (LDP) framework. We present an optimistic algorithmthat simultaneously satisfies LDP requirements, and achieves sublinear regret.We also establish a lower bound for regret minimization in finite-horizon MDPswith LDP guarantees. These results show that while LDP is appealing inpractical applications, the setting is inherently more complex. In particular,our results demonstrate that the cost of privacy is multiplicative whencompared to non-private settings.

Quick Read (beta)

loading the full paper ...