Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

Abstract

Language Confusion is a phenomenon where Large Language Models (LLMs)generate text that is neither in the desired language, nor in a contextuallyappropriate language. This phenomenon presents a critical challenge in textgeneration by LLMs, often appearing as erratic and unpredictable behavior. Wehypothesize that there are linguistic regularities to this inherentvulnerability in LLMs and shed light on patterns of language confusion acrossLLMs. We introduce a novel metric, Language Confusion Entropy, designed todirectly measure and quantify this confusion, based on language distributionsinformed by linguistic typology and lexical variation. Comprehensivecomparisons with the Language Confusion Benchmark (Marchisio et al., 2024)confirm the effectiveness of our metric, revealing patterns of languageconfusion across LLMs. We further link language confusion to LLM security, andfind patterns in the case of multilingual embedding inversion attacks. Ouranalysis demonstrates that linguistic typology offers theoretically groundedinterpretation, and valuable insights into leveraging language similarities asa prior for LLM alignment and security.

Quick Read (beta)

loading the full paper ...