Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

Abstract

When a language model generates text, the selection of individual tokensmight lead it down very different reasoning paths, making uncertainty difficultto quantify. In this work, we consider whether reasoning language modelsrepresent the alternate paths that they could take during generation. To testthis hypothesis, we use hidden activations to control and predict a languagemodel's uncertainty during chain-of-thought reasoning. In our experiments, wefind a clear correlation between how uncertain a model is at different tokens,and how easily the model can be steered by controlling its activations. Thissuggests that activation interventions are most effective when there arealternate paths available to the model -- in other words, when it has not yetcommitted to a particular final answer. We also find that hidden activationscan predict a model's future outcome distribution, demonstrating that modelsimplicitly represent the space of possible paths.

Quick Read (beta)

loading the full paper ...