Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

Abstract

Chain-of-thought (CoT) decoding enables language models to improve reasoningperformance at the cost of high generation latency in decoding. Recentproposals have explored variants of contemplation tokens, a term we introducethat refers to special tokens used during inference to allow for extracomputation. Prior work has considered fixed-length sequences drawn from adiscrete set of embeddings as contemplation tokens. Here we propose CompressedChain-of-Thought (CCoT), a framework to generate contentful and continuouscontemplation tokens of variable sequence length. The generated contemplationtokens are compressed representations of explicit reasoning chains, and ourmethod can be applied to off-the-shelf decoder language models. Throughexperiments, we illustrate how CCoT enables additional reasoning over densecontentful representations to achieve corresponding improvements in accuracy.Moreover, the reasoning improvements can be adaptively modified on demand bycontrolling the number of contemplation tokens generated.

Quick Read (beta)

loading the full paper ...