Self-Steering Language Models

Abstract

While test-time reasoning enables language models (LMs) to tackle complextasks, searching or planning in natural language can be slow, costly, anderror-prone. But even when LMs struggle to emulate the precise reasoning stepsneeded to solve a problem, they often excel at describing its abstractstructure--both how to verify solutions and how to search for them. This paperintroduces DisCIPL, a method for "self-steering" LMs where a Planner modelgenerates a task-specific inference program that is executed by a population ofFollower models. Our approach equips LMs with the ability to write recursivesearch procedures that guide LM inference, enabling new forms of verifiable andefficient reasoning. When instantiated with a small Follower (e.g.,Llama-3.2-1B or Qwen3-1.7B), DisCIPL matches (and sometimes outperforms) muchlarger models, including GPT-4o and o1, on challenging constrained generationtasks. Our work opens up a design space of highly-parallelized Monte Carloinference strategies that outperform standard best-of-N sampling, require nofinetuning, and can be implemented automatically by existing LMs.

Quick Read (beta)

loading the full paper ...