Towards Formal Verification of LLM-Generated Code from Natural Language Prompts

Abstract

In the past few years LLMs have emerged as a tool that can aid programmers bytaking natural language descriptions and generating code based on it. However,LLMs often generate incorrect code that users need to fix and the literaturesuggests users often struggle to detect these errors. In this work we seek tooffer formal guarantees of correctness to LLM generated code; such guaranteescould improve the experience of using AI Code Assistants and potentially enablenatural language programming for users with little or no programming knowledge.To address this challenge we propose to incorporate a formal query languagethat can represent a user's intent in a formally defined but naturallanguage-like manner that a user can confirm matches their intent. Then, usingsuch a query we propose to verify LLM generated code to ensure it matches theuser's intent. We implement these ideas in our system, Astrogator, for theAnsible programming language which includes such a formal query language, acalculus for representing the behavior of Ansible programs, and a symbolicinterpreter which is used for the verification. On a benchmark suite of 21code-generation tasks, our verifier is able to verify correct code in 83% ofcases and identify incorrect code in 92%.

Quick Read (beta)

loading the full paper ...