Deep Learning to Detect Redundant Method Comments

Abstract

Comments in software are critical for maintenance and reuse. But apart fromprescriptive advice, there is little practical support or quantitativeunderstanding of what makes a comment useful. In this paper, we introduce thetask of identifying comments which are uninformative about the code they aremeant to document. To address this problem, we introduce the notion of commententailment from code, high entailment indicating that a comment's naturallanguage semantics can be inferred directly from the code. Although not allentailed comments are low quality, comments that are too easily inferred, forexample, comments that restate the code, are widely discouraged by authoritieson software style. Based on this, we develop a tool called CRAIC which scoresmethod-level comments for redundancy. Highly redundant comments can then beexpanded or alternately removed by the developer. CRAIC uses deep languagemodels to exploit large software corpora without requiring expensive manualannotations of entailment. We show that CRAIC can perform the commententailment task with good agreement with human judgements. Our findings alsohave implications for documentation tools. For example, we find that commontags in Javadoc are at least two times more predictable from code thannon-Javadoc sentences, suggesting that Javadoc tags are less informative thanmore free-form comments

Quick Read (beta)

loading the full paper ...