Abstract
Instruction tuning is a supervised fine-tuning approach that significantlyimproves the ability of large language models (LLMs) to follow humaninstructions. We propose SelfCodeAlign, the first fully transparent andpermissive pipeline for self-aligning code LLMs without extensive humanannotations or distillation. SelfCodeAlign employs the same base model forinference throughout the data generation process. It first extracts diversecoding concepts from high-quality seed snippets to generate new tasks. It thensamples multiple responses per task, pairs each with test cases, and validatesthem in a sandbox environment. Finally, passing examples are selected forinstruction tuning. In our primary experiments, we use SelfCodeAlign withCodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs.Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 onHumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller.Across all benchmarks, this finetuned model consistently outperforms theoriginal version trained with OctoPack, the previous state-of-the-art methodfor instruction tuning without human annotations or distillation. Additionally,we show that SelfCodeAlign is effective across LLMs of various sizes, from 3Bto 33B, and that the base models can benefit more from alignment with their owndata distribution. We further validate each component's effectiveness in ourpipeline, showing that SelfCodeAlign outperforms both direct distillation fromGPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct andEvol-Instruct. SelfCodeAlign has also led to the creation ofStarCoder2-Instruct, the first fully transparent, permissively licensed, andself-aligned code LLM that achieves state-of-the-art coding performance.