MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning

Abstract

Representation learning of source code is essential for applying machinelearning to software engineering tasks. Learning code representation acrossdifferent programming languages has been shown to be more effective thanlearning from single-language datasets, since more training data frommulti-language datasets improves the model's ability to extractlanguage-agnostic information from source code. However, existingmulti-language models overlook the language-specific information which iscrucial for downstream tasks that is training on multi-language datasets, whileonly focusing on learning shared parameters among the different languages. Toaddress this problem, we propose MetaTPTrans, a meta learning approach formultilingual code representation learning. MetaTPTrans generates differentparameters for the feature extractor according to the specific programminglanguage of the input source code snippet, enabling the model to learn bothlanguage-agnostics and language-specific information. Experimental results showthat MetaTPTrans improves the F1 score of state-of-the-art approachessignificantly by up to 2.40 percentage points for code summarization, alanguage-agnostic task; and the prediction accuracy of Top-1 (Top-5) by up to7.32 (13.15) percentage points for code completion, a language-specific task.

Quick Read (beta)

loading the full paper ...