code2seq: Generating Sequences from Structured Representations of Code

  • 2018-10-10 19:15:15
  • Uri Alon, Omer Levy, Eran Yahav
  • 0

Abstract

The ability to generate natural language sequences from source code snippetshas a variety of applications such as code summarization, documentation, andretrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machinetranslation (NMT), have achieved state-of-the-art performance on these tasks bytreating source code as a sequence of tokens. We present ${\rm {\scriptsizeCODE2SEQ}}$: an alternative approach that leverages the syntactic structure ofprogramming languages to better encode source code. Our model represents a codesnippet as the set of compositional paths in its abstract syntax tree (AST) anduses attention to select the relevant paths while decoding. We demonstrate theeffectiveness of our approach for two tasks, two programming languages, andfour datasets of up to $16$M examples. Our model significantly outperformsprevious models that were specifically designed for programming languages, aswell as state-of-the-art NMT models.

 

Quick Read (beta)

loading the full paper ...