Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons

Abstract

Large Language Models (LLMs) trained on massive data capture rich informationembedded in the training data. However, this also introduces the risk ofprivacy leakage, particularly involving personally identifiable information(PII). Although previous studies have shown that this risk can be mitigatedthrough methods such as privacy neurons, they all assume that both the(sensitive) training data and user queries are in English. We show that theycannot defend against the privacy leakage in cross-lingual contexts: even ifthe training data is exclusively in one language, these (private) models maystill reveal private information when queried in another language. In thiswork, we first investigate the information flow of cross-lingual privacyleakage to give a better understanding. We find that LLMs process privateinformation in the middle layers, where representations are largely sharedacross languages. The risk of leakage peaks when converted to alanguage-specific space in later layers. Based on this, we identifyprivacy-universal neurons and language-specific privacy neurons.Privacy-universal neurons influence privacy leakage across all languages, whilelanguage-specific privacy neurons are only related to specific languages. Bydeactivating these neurons, the cross-lingual privacy leakage risk is reducedby 23.3%-31.6%.

Quick Read (beta)

loading the full paper ...