The 20 questions game to distinguish large language models

Abstract

In a parallel with the 20 questions game, we present a method to determinewhether two large language models (LLMs), placed in a black-box context, arethe same or not. The goal is to use a small set of (benign) binary questions,typically under 20. We formalize the problem and first establish a baselineusing a random selection of questions from known benchmark datasets, achievingan accuracy of nearly 100% within 20 questions. After showing optimal boundsfor this problem, we introduce two effective questioning heuristics able todiscriminate 22 LLMs by using half as many questions for the same task. Thesemethods offer significant advantages in terms of stealth and are thus ofinterest to auditors or copyright owners facing suspicions of model leaks.

Quick Read (beta)

loading the full paper ...