TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation

Abstract

As large language models (LMs) advance, there is an increasing need tocontrol their outputs to align with human values (e.g., detoxification) ordesired attributes (e.g., personalization, topic). However, autoregressivemodels focus on next-token predictions and struggle with global properties thatrequire looking ahead. Existing solutions either tune or post-train LMs foreach new attribute - expensive and inflexible - or approximate the ExpectedAttribute Probability (EAP) of future sequences by sampling or training, whichis slow and unreliable for rare attributes. We introduce TRACE (TractableProbabilistic Reasoning for Adaptable Controllable gEneration), a novelframework that efficiently computes EAP and adapts to new attributes throughtractable probabilistic reasoning and lightweight control. TRACE distills aHidden Markov Model (HMM) from an LM and pairs it with a small classifier toestimate attribute probabilities, enabling exact EAP computation over the HMM'spredicted futures. This EAP is then used to reweigh the LM's next-tokenprobabilities for globally compliant continuations. Empirically, TRACE achievesstate-of-the-art results in detoxification with only 10% decoding overhead,adapts to 76 low-resource personalized LLMs within seconds, and seamlesslyextends to composite attributes.

Quick Read (beta)

loading the full paper ...