Abstract
A wide range of LM applications require generating text that conforms tosyntactic or semantic constraints. Imposing such constraints can be naturallyframed as probabilistic conditioning, but exact generation from the resultingdistribution -- which can differ substantially from the LM's base distribution-- is generally intractable. In this work, we develop an architecture forcontrolled LM generation based on sequential Monte Carlo (SMC). Our SMCframework allows us to flexibly incorporate domain- and problem-specificconstraints at inference time, and efficiently reallocate computationalresources in light of new information during the course of generation. Bycomparing to a number of alternatives and ablations on four challenging domains-- Python code generation for data science, text-to-SQL, goal inference, andmolecule synthesis -- we demonstrate that, with little overhead, our approachallows small open-source language models to outperform models over 8x larger,as well as closed-source, fine-tuned ones. In support of the probabilisticperspective, we show that these performance improvements are driven by betterapproximation to the posterior distribution. Our system builds on the frameworkof Lew et al. (2023) and integrates with its language model probabilisticprogramming language, giving users a simple, programmable way to apply SMC to abroad variety of controlled generation problems.