Token-Budget-Aware LLM Reasoning

Abstract

Reasoning is critical for large language models (LLMs) to excel in a widerange of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLMperformance by decomposing problems into intermediate steps, they also incursignificant overhead in token usage, leading to increased costs. We find thatthe reasoning process of current LLMs is unnecessarily lengthy and it can becompressed by including a reasonable token budget in the prompt, but the choiceof token budget plays a crucial role in the actual compression effectiveness.We then propose a token-budget-aware LLM reasoning framework, which dynamicallyestimates token budgets for different problems based on reasoning complexityand uses the estimated token budgets to guide the reasoning process.Experiments show that our method effectively reduces token costs in CoTreasoning with only a slight performance reduction, offering a practicalsolution to balance efficiency and accuracy in LLM reasoning. Code:https://github.com/GeniusHTX/TALE.

Quick Read (beta)

loading the full paper ...