StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Abstract

The advancement of large language models (LLMs) has significantly propelledthe field of code generation. Previous work integrated reinforcement learning(RL) with compiler feedback for exploring the output space of LLMs to enhancecode generation quality. However, the lengthy code generated by LLMs inresponse to complex human requirements makes RL exploration a challenge. Also,since the unit tests may not cover the complicated code, optimizing LLMs byusing these unexecuted code snippets is ineffective. To tackle thesechallenges, we introduce StepCoder, a novel RL framework for code generation,consisting of two main components: CCCS addresses the exploration challenge bybreaking the long sequences code generation task into a Curriculum of CodeCompletion Subtasks, while FGO only optimizes the model by masking theunexecuted code segments to provide Fine-Grained Optimization. In addition, wefurthermore construct the APPS+ dataset for RL training, which is manuallyverified to ensure the correctness of unit tests. Experimental results showthat our method improves the ability to explore the output space andoutperforms state-of-the-art approaches in corresponding benchmarks. Ourdataset APPS+ and StepCoder are available online.

Quick Read (beta)

loading the full paper ...