Learning Composable Chains-of-Thought

Abstract

A common approach for teaching large language models (LLMs) to reason is totrain on chain-of-thought (CoT) traces of in-distribution reasoning problems,but such annotated data is costly to obtain for every problem of interest. Wewant reasoning models to generalize beyond their training distribution, andideally to generalize compositionally: combine atomic reasoning skills to solveharder, unseen reasoning tasks. We take a step towards compositionalgeneralization of reasoning skills when addressing a target compositional taskthat has no labeled CoT data. We find that simply training models on CoT dataof atomic tasks leads to limited generalization, but minimally modifying CoTformats of constituent atomic tasks to be composable can lead to improvements.We can train "atomic CoT" models on the atomic tasks with Composable CoT dataand combine them with multitask learning or model merging for better zero-shotperformance on the target compositional task. Such a combined model can befurther bootstrapped on a small amount of compositional data using rejectionsampling fine-tuning (RFT). Results on string operations and natural languageskill compositions show that training LLMs on Composable CoT outperformsmultitask learning and continued fine-tuning baselines within a given trainingdata budget.

Quick Read (beta)

loading the full paper ...