Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models

Abstract

Web-based Large Language Model (LLM) services have been widely adopted andhave become an integral part of our Internet experience. Third-party pluginsenhance the functionalities of LLM by enabling access to real-world data andservices. However, the privacy consequences associated with these services andtheir third-party plugins are not well understood. Sensitive prompt data arestored, processed, and shared by cloud-based LLM providers and third-partyplugins. In this paper, we propose Casper, a prompt sanitization technique thataims to protect user privacy by detecting and removing sensitive informationfrom user inputs before sending them to LLM services. Casper runs entirely onthe user's device as a browser extension and does not require any changes tothe online LLM services. At the core of Casper is a three-layered sanitizationmechanism consisting of a rule-based filter, a Machine Learning (ML)-basednamed entity recognizer, and a browser-based local LLM topic identifier. Weevaluate Casper on a dataset of 4000 synthesized prompts and show that it caneffectively filter out Personal Identifiable Information (PII) andprivacy-sensitive topics with high accuracy, at 98.5% and 89.9%, respectively.

Quick Read (beta)

loading the full paper ...