Abstract
The limited reasoning capabilities of small language models (SLMs) cast doubton their suitability for tasks demanding deep, multi-step logical deduction.This paper introduces a framework called Small Reasons, Large Hints (SMART),which selectively augments SLM reasoning with targeted guidance from largelanguage models (LLMs). Inspired by the concept of cognitive scaffolding, SMARTemploys a score-based evaluation to identify uncertain reasoning steps andinjects corrective LLM-generated reasoning only when necessary. By framingstructured reasoning as an optimal policy search, our approach steers thereasoning trajectory toward correct solutions without exhaustive sampling. Ourexperiments on mathematical reasoning datasets demonstrate that targetedexternal scaffolding significantly improves performance, paving the way forcollaborative use of both SLM and LLM to tackle complex reasoning tasks thatare currently unsolvable by SLMs alone.