Non-myopic Generation of Language Model for Reasoning and Planning

Abstract

Large Language Models have demonstrated remarkable abilities in reasoning andplanning by breaking down complex problems into sequential steps. Despite theirsuccess in various domains like mathematical problem-solving and coding, LLMsface challenges in ensuring reliable and optimal planning due to their inherentmyopic nature of autoregressive decoding. This paper revisits LLM reasoningfrom an optimal-control perspective, proposing a novel method,Predictive-Decoding, that leverages Model Predictive Control to enhanceplanning accuracy. By re-weighting LLM distributions based on foresighttrajectories, Predictive-Decoding aims to mitigate early errors and promotenon-myopic planning. Our experiments show significant improvements in a widerange of tasks for math, coding, and agents. Furthermore, Predictive-Decodingdemonstrates computational efficiency, outperforming search baselines withreduced computational resources. This study provides insights into optimizingLLM planning capabilities.

Quick Read (beta)

loading the full paper ...