CodeTrans: Towards Cracking the Language of Silicone's Code Through Self-Supervised Deep Learning and High Performance Computing

Abstract

Currently, a growing number of mature natural language processingapplications make people's life more convenient. Such applications are built bysource code - the language in software engineering. However, the applicationsfor understanding source code language to ease the software engineering processare under-researched. Simultaneously, the transformer model, especially itscombination with transfer learning, has been proven to be a powerful techniquefor natural language processing tasks. These breakthroughs point out apromising direction for process source code and crack software engineeringtasks. This paper describes CodeTrans - an encoder-decoder transformer modelfor tasks in the software engineering domain, that explores the effectivenessof encoder-decoder transformer models for six software engineering tasks,including thirteen sub-tasks. Moreover, we have investigated the effect ofdifferent training strategies, including single-task learning, transferlearning, multi-task learning, and multi-task learning with fine-tuning.CodeTrans outperforms the state-of-the-art models on all the tasks. To expeditefuture works in the software engineering domain, we have published ourpre-trained models of CodeTrans. https://github.com/agemagician/CodeTrans

Quick Read (beta)

loading the full paper ...