Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?

Abstract

Informal natural language that describes code functionality, such as codecomments or function documentation, may contain substantial information about aprograms intent. However, there is typically no guarantee that a programsimplementation and natural language documentation are aligned. In the case of aconflict, leveraging information in code-adjacent natural language has thepotential to enhance fault localization, debugging, and code trustworthiness.In practice, however, this information is often underutilized due to theinherent ambiguity of natural language which makes natural language intentchallenging to check programmatically. The emergent abilities of Large LanguageModels (LLMs) have the potential to facilitate the translation of naturallanguage intent to programmatically checkable assertions. However, it isunclear if LLMs can correctly translate informal natural languagespecifications into formal specifications that match programmer intent.Additionally, it is unclear if such translation could be useful in practice. Inthis paper, we describe nl2postcond, the problem of leveraging LLMs fortransforming informal natural language to formal method postconditions,expressed as program assertions. We introduce and validate metrics to measureand compare different nl2postcond approaches, using the correctness anddiscriminative power of generated postconditions. We then use qualitative andquantitative methods to assess the quality of nl2postcond postconditions,finding that they are generally correct and able to discriminate incorrectcode. Finally, we find that nl2postcond via LLMs has the potential to behelpful in practice; nl2postcond generated postconditions were able to catch 64real-world historical bugs from Defects4J.

Quick Read (beta)

loading the full paper ...