Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality

Abstract

Supervised fine-tuning (SFT) is a critical step in aligning large languagemodels (LLMs) with human instructions and values, yet many aspects of SFTremain poorly understood. We trained a wide range of base models on a varietyof datasets including code generation, mathematical reasoning, andgeneral-domain tasks, resulting in 1,000+ SFT models under controlledconditions. We then identified the dataset properties that matter most andexamined the layer-wise modifications introduced by SFT. Our findings revealthat some training-task synergies persist across all models while others varysubstantially, emphasizing the importance of model-specific strategies.Moreover, we demonstrate that perplexity consistently predicts SFTeffectiveness, often surpassing superficial similarity between the trainingdata and the benchmark, and that mid-layer weight changes correlate moststrongly with performance gains. We release these 1,000+ SFT models andbenchmark results to accelerate further research. All resources are availableat https://github.com/llm-jp/massive-sft.

Quick Read (beta)

loading the full paper ...