Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Abstract

In this work, we apply anomaly detection to source code and bytecode tofacilitate the development of a programming language and its compiler. Wedefine anomaly as a code fragment that is different from typical code writtenin a particular programming language. Identifying such code fragments isbeneficial to both language developers and end users, since anomalies mayindicate potential issues with the compiler or with runtime performance.Moreover, anomalies could correspond to problems in language design. For thisstudy, we choose Kotlin as the target programming language. We outline anddiscuss approaches to obtaining vector representations of source code andbytecode and to the detection of anomalies across vectorized code snippets. Thepaper presents a method that aims to detect two types of anomalies: syntax treeanomalies and so-called compiler-induced anomalies that arise only in thecompiled bytecode. We describe several experiments that employ differentcombinations of vectorization and anomaly detection techniques and discusstypes of detected anomalies and their usefulness for language developers. Wedemonstrate that the extracted anomalies and the underlying extractiontechnique provide additional value for language development.

Quick Read (beta)

loading the full paper ...