An Overview of Privacy in Machine Learning

Abstract

Over the past few years, providers such as Google, Microsoft, and Amazon havestarted to provide customers with access to software interfaces allowing themto easily embed machine learning tasks into their applications. Overall,organizations can now use Machine Learning as a Service (MLaaS) engines tooutsource complex tasks, e.g., training classifiers, performing predictions,clustering, etc. They can also let others query models trained on their data.Naturally, this approach can also be used (and is often advocated) in othercontexts, including government collaborations, citizen science projects, andbusiness-to-business partnerships. However, if malicious users were able to recover data used to train thesemodels, the resulting information leakage would create serious issues.Likewise, if the inner parameters of the model are considered proprietaryinformation, then access to the model should not allow an adversary to learnsuch parameters. In this document, we set to review privacy challenges in thisspace, providing a systematic review of the relevant research literature, alsoexploring possible countermeasures. More specifically, we provide ample background information on relevantconcepts around machine learning and privacy. Then, we discuss possibleadversarial models and settings, cover a wide range of attacks that relate toprivate and/or sensitive information leakage, and review recent resultsattempting to defend against such attacks. Finally, we conclude with a list of open problems that require more work,including the need for better evaluations, more targeted defenses, and thestudy of the relation to policy and data protection efforts.

Quick Read (beta)

loading the full paper ...