Abstract
The utilization of programming language (PL) models, pretrained onlarge-scale code corpora, as a means of automating software engineeringprocesses has demonstrated considerable potential in streamlining various codegeneration tasks such as code completion, code translation, and programsynthesis. However, current approaches mainly rely on supervised fine-tuningobjectives borrowed from text generation, neglecting specific sequence-levelfeatures of code, including but not limited to compilability as well assyntactic and functional correctness. To address this limitation, we proposePPOCoder, a new framework for code generation that combines pretrained PLmodels with Proximal Policy Optimization (PPO) deep reinforcement learning andemploys execution feedback as the external source of knowledge into the modeloptimization. PPOCoder is transferable across different code generation tasksand PLs. Extensive experiments on three code generation tasks demonstrate theeffectiveness of our proposed approach compared to SOTA methods, improving thesuccess rate of compilation and functional correctness over different PLs. Ourcode can be found at https://github.com/reddy-lab-code-research/PPOCoder .