SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

Abstract

Test-Time Scaling (TTS) refers to approaches that improve reasoningperformance by allocating extra computation during inference, without alteringthe model's parameters. While existing TTS methods operate in a discrete tokenspace by generating more intermediate steps, recent studies in Coconut andSoftCoT have demonstrated that thinking in the continuous latent space canfurther enhance the reasoning performance. Such latent thoughts encodeinformative thinking without the information loss associated withautoregressive token generation, sparking increased interest incontinuous-space reasoning. Unlike discrete decoding, where repeated samplingenables exploring diverse reasoning paths, latent representations in continuousspace are fixed for a given input, which limits diverse exploration, as alldecoded paths originate from the same latent thought. To overcome thislimitation, we introduce SoftCoT++ to extend SoftCoT to the Test-Time Scalingparadigm by enabling diverse exploration of thinking paths. Specifically, weperturb latent thoughts via multiple specialized initial tokens and applycontrastive learning to promote diversity among soft thought representations.Experiments across five reasoning benchmarks and two distinct LLM architecturesdemonstrate that SoftCoT++ significantly boosts SoftCoT and also outperformsSoftCoT with self-consistency scaling. Moreover, it shows strong compatibilitywith conventional scaling techniques such as self-consistency. Source code isavailable at https://github.com/xuyige/SoftCoT.

Quick Read (beta)

loading the full paper ...