Automatic Code Generation using Pre-Trained Language Models

Abstract

Recent advancements in natural language processing \cite{gpt2} \cite{BERT}have led to near-human performance in multiple natural language tasks. In thispaper, we seek to understand whether similar techniques can be applied to ahighly structured environment with strict syntax rules. Specifically, wepropose an end-to-end machine learning model for code generation in the Pythonlanguage built on-top of pre-trained language models. We demonstrate that afine-tuned model can perform well in code generation tasks, achieving a BLEUscore of 0.22, an improvement of 46\% over a reasonable sequence-to-sequencebaseline. All results and related code used for training and data processingare available on GitHub.

Quick Read (beta)

loading the full paper ...