Fine-Tuning Code Language Models to Detect Cross-Language Bugs

Abstract

Multilingual programming, which involves using multiple programming languages(PLs) in a single project, is increasingly common due to its benefits. However,it introduces cross-language bugs (CLBs), which arise from interactions betweendifferent PLs and are difficult to detect by single-language bug detectiontools. This paper investigates the potential of pre-trained code languagemodels (CodeLMs) in CLB detection. We developed CLCFinder, a cross-languagecode identification tool, and constructed a CLB dataset involving three PLcombinations (Python-C/C++, Java-C/C++, and Python-Java) with nine interactiontypes. We fine-tuned 13 CodeLMs on this dataset and evaluated theirperformance, analyzing the effects of dataset size, token sequence length, andcode comments. Results show that all CodeLMs performed poorly beforefine-tuning, but exhibited varying degrees of performance improvement afterfine-tuning, with UniXcoder-base achieving the best F1 score (0.7407). Notably,small fine-tuned CodeLMs tended to performe better than large ones. CodeLMsfine-tuned on single-language bug datasets performed poorly on CLB detection,demonstrating the distinction between CLBs and single-language bugs.Additionally, increasing the fine-tuning dataset size significantly improvedperformance, while longer token sequences did not necessarily improve the modelperformance. The impact of code comments varied across models. Some fine-tunedCodeLMs' performance was improved, while others showed degraded performance.

Quick Read (beta)

loading the full paper ...