Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

Abstract

Despite recent advances in large language models, open-source models oftenstruggle to consistently perform well on complex reasoning tasks. Existingensemble methods, whether applied at the token or output levels, fail toaddress these challenges. In response, we present Language model Ensemble withMonte Carlo Tree Search (LE-MCTS), a novel framework for process-levelensembling of language models. LE-MCTS formulates step-by-step reasoning withan ensemble of language models as a Markov decision process. In this framework,states represent intermediate reasoning paths, while actions consist ofgenerating the next reasoning step using one of the language models selectedfrom a predefined pool. Guided by a process-based reward model, LE-MCTSperforms a tree search over the reasoning steps generated by different languagemodels, identifying the most accurate reasoning chain. Experimental results onfive mathematical reasoning benchmarks demonstrate that our approachoutperforms both single language model decoding algorithms and language modelensemble methods. Notably, LE-MCTS improves performance by 3.6% and 4.3% on theMATH and MQA datasets, respectively, highlighting its effectiveness in solvingcomplex reasoning problems.

Quick Read (beta)

loading the full paper ...