funcGNN: A Graph Neural Network Approach to Program Similarity

  • 2020-07-30 17:19:16
  • Aravind Nair, Avijit Roy, Karl Meinke
  • 0

Abstract

Program similarity is a fundamental concept, central to the solution ofsoftware engineering tasks such as software plagiarism, clone identification,code refactoring and code search. Accurate similarity estimation betweenprograms requires an in-depth understanding of their structure, semantics andflow. A control flow graph (CFG), is a graphical representation of a programwhich captures its logical control flow and hence its semantics. A commonapproach is to estimate program similarity by analysing CFGs using graphsimilarity measures, e.g. graph edit distance (GED). However, graph editdistance is an NP-hard problem and computationally expensive, making theapplication of graph similarity techniques to complex software programsimpractical. This study intends to examine the effectiveness of graph neuralnetworks to estimate program similarity, by analysing the associated controlflow graphs. We introduce funcGNN, which is a graph neural network trained onlabeled CFG pairs to predict the GED between unseen program pairs by utilizingan effective embedding vector. To our knowledge, this is the first time graphneural networks have been applied on labeled CFGs for estimating the similaritybetween high-level language programs. Results: We demonstrate the effectivenessof funcGNN to estimate the GED between programs and our experimental analysisdemonstrates how it achieves a lower error rate (0.00194), with faster (23times faster than the quickest traditional GED approximation method) and betterscalability compared with the state of the art methods. funcGNN posses theinductive learning ability to infer program structure and generalise to unseenprograms. The graph embedding of a program proposed by our methodology could beapplied to several related software engineering problems (such as codeplagiarism and clone identification) thus opening multiple research directions.

 

Quick Read (beta)

loading the full paper ...