Abstract
The goal of program synthesis, or code generation, is to generate executablecode based on given descriptions. Recently, there has been an increasing numberof studies employing reinforcement learning (RL) to improve the performance oflarge language models (LLMs) for code. However, current representative workseither rely solely on offline frameworks, limiting the exploration of newsample spaces, or fall short in the utilization of unit test signals, notaccounting for specific error locations within the code. To address theseissues, we propose RLTF, i.e., Reinforcement Learning from Unit Test Feedback,a novel online RL framework with unit test feedback of multi-granularity forrefining code LLMs. Our approach generates data in real-time during trainingand simultaneously utilizes fine-grained feedback signals to guide the modeltowards producing higher-quality code. Extensive experiments show that RLTFachieves state-of-the-art performance on the APPS and the MBPP benchmarks. Ourcode is available at: https://github.com/Zyq-scut/RLTF.