Publicly-Detectable Watermarking for Language Models

Abstract

We present a highly detectable, trustless watermarking scheme for LLMs: thedetection algorithm contains no secret information, and it is executable byanyone. We embed a publicly-verifiable cryptographic signature into LLM outputusing rejection sampling. We prove that our scheme is cryptographicallycorrect, sound, and distortion-free. We make novel uses of error-correctiontechniques to overcome periods of low entropy, a barrier for all priorwatermarking schemes. We implement our scheme and make empirical measurementsover open models in the 2.7B to 70B parameter range. Our experiments suggestthat our formal claims are met in practice.

Quick Read (beta)

loading the full paper ...