Abstract
We explore a simple ensemble strategy, self-consistency, that significantlyimproves the reasoning accuracy of large language models. The idea is to samplea diverse set of outputs from a language model and return the most consistentanswer in the set. Such ensembling method improves reasoning accuracy whencombined with chain of thought prompting. For arithmetic and commonsensereasoning benchmarks we find that self-consistency yields significant accuracyimprovements in a variety of datasets, such as GSM8K (+10%), SVAMP (+14%),MultiArith (+24%), CommonsenseQA (+5%) and ARC (easy +4%, challenge +5%).
Quick Read (beta)
loading the full paper ...