xASTNN: Improved Code Representations for Industrial Practice

  • 2023-11-06 03:06:37
  • Zhiwei Xu, Min Zhou, Xibin Zhao, Yang Chen, Xi Cheng, Hongyu Zhang
  • 0


The application of deep learning techniques in software engineering becomesincreasingly popular. One key problem is developing high-quality andeasy-to-use source code representations for code-related tasks. The researchcommunity has acquired impressive results in recent years. However, due to thedeployment difficulties and performance bottlenecks, seldom these approachesare applied to the industry. In this paper, we present xASTNN, an eXtremeAbstract Syntax Tree (AST)-based Neural Network for source code representation,aiming to push this technique to industrial practice. The proposed xASTNN hasthree advantages. First, xASTNN is completely based on widely-used ASTs anddoes not require complicated data pre-processing, making it applicable tovarious programming languages and practical scenarios. Second, threeclosely-related designs are proposed to guarantee the effectiveness of xASTNN,including statement subtree sequence for code naturalness, gated recursive unitfor syntactical information, and gated recurrent unit for sequentialinformation. Third, a dynamic batching algorithm is introduced to significantlyreduce the time complexity of xASTNN. Two code comprehension downstream tasks,code classification and code clone detection, are adopted for evaluation. Theresults demonstrate that our xASTNN can improve the state-of-the-art whilebeing faster than the baselines.


Quick Read (beta)

loading the full paper ...