Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

  • 2018-03-13 10:30:55
  • Nghi D. Q. Bui, Lingxiao Jiang
  • 3

Abstract

Translating a program written in one programming language to another can beuseful for software development tasks that need functionality implementationsin different languages. Although past studies have considered this problem,they may be either specific to the language grammars, or specific to certainkinds of code elements (e.g., tokens, phrases, API uses). This paper proposes anew approach to automatically learn cross-language representations for variouskinds of structural code elements that may be used for program translation. Ourkey idea is two folded: First, we normalize and enrich code token streams withadditional structural and semantic information, and train cross-language vectorrepresentations for the tokens (a.k.a. shared embeddings based on word2vec, aneural-network-based technique for producing word embeddings; Second,hierarchically from bottom up, we construct shared embeddings for code elementsof higher levels of granularity (e.g., expressions, statements, methods) fromthe embeddings for their constituents, and then build mappings among codeelements across languages based on similarities among embeddings. Our preliminary evaluations on about 40,000 Java and C# source files from 9software projects show that our approach can automatically learn sharedembeddings for various code elements in different languages and identify theircross-language mappings with reasonable Mean Average Precision scores. Whencompared with an existing tool for mapping library API methods, our approachidentifies many more mappings accurately. The mapping results and code can beaccessed athttps://github.com/bdqnghi/hierarchical-programming-language-mapping. Webelieve that our idea for learning cross-language vector representations withcode structural information can be a useful step towards automated programtranslation.

 

Quick Read (beta)

loading the full paper ...