STaR: Bootstrapping Reasoning With Reasoning

Abstract

Generating step-by-step "chain-of-thought" rationales improves language modelperformance on complex reasoning tasks like mathematics or commonsensequestion-answering. However, inducing language model rationale generationcurrently requires either constructing massive rationale datasets orsacrificing accuracy by using only few-shot inference. We propose a techniqueto iteratively leverage a small number of rationale examples and a largedataset without rationales, to bootstrap the ability to perform successivelymore complex reasoning. This technique, the "Self-Taught Reasoner" (STaR),relies on a simple loop: generate rationales to answer many questions, promptedwith a few rationale examples; if the generated answers are wrong, try again togenerate a rationale given the correct answer; fine-tune on all the rationalesthat ultimately yielded correct answers; repeat. We show that STaRsignificantly improves performance on multiple datasets compared to a modelfine-tuned to directly predict final answers, and performs comparably tofine-tuning a 30$\times$ larger state-of-the-art language model onCommensenseQA. Thus, STaR lets a model improve itself by learning from its owngenerated reasoning.

Quick Read (beta)

loading the full paper ...