Structural Language Models of Code

Abstract

We address the problem of any-code completion - generating a missing piece ofsource code in a given program without any restriction on the vocabulary orstructure. We introduce a new approach to any-code completion that leveragesthe strict syntax of programming languages to model a code snippet as a tree -structural language modeling (SLM). SLM estimates the probability of theprogram's abstract syntax tree (AST) by decomposing it into a product ofconditional probabilities over its nodes. We present a neural model thatcomputes these conditional probabilities by considering all AST paths leadingto a target node. Unlike previous techniques that have severely restricted thekinds of expressions that can be generated in this task, our approach cangenerate arbitrary code in any programming language. Our model significantlyoutperforms both seq2seq and a variety of structured approaches in generatingJava and C# code. Our code, data, and trained models are available athttp://github.com/tech-srl/slm-code-generation/ . An online demo is availableat http://AnyCodeGen.org .

Quick Read (beta)

loading the full paper ...