Interactive Code Generation via Test-Driven User-Intent Formalization

Abstract

Pre-trained large language models (LLMs) such as OpenAI Codex have shownimmense potential in automating significant aspects of coding by producingnatural code from informal natural language (NL) intent. However, the codeproduced does not have any correctness guarantees around satisfying user'sintent. In fact, it is hard to define a notion of correctness since naturallanguage can be ambiguous and lacks a formal semantics. In this paper, we takea first step towards addressing the problem above by proposing the workflow oftest-driven user-intent formalization (TDUIF), which leverages lightweight userfeedback to jointly (a) formalize the user intent as tests (a partialspecification), and (b) generates code that meets the formal user intent. Toperform a scalable and large-scale automated evaluation of the algorithmswithout requiring a user in the loop, we describe how to simulate userinteraction with high-fidelity using a reference solution. We also describe andimplement alternate implementations of several algorithmic components(including mutating and ranking a set of tests) that can be composed forefficient solutions to the TDUIF problem. We have developed a system TICODERthat implements several solutions to TDUIF, and compare their relativeeffectiveness on the MBPP academic code generation benchmark. Our results arepromising with using the OpenAI Codex LLM on MBPP: our best algorithm improvesthe pass@1 code generation accuracy metric from 48.39% to 70.49% with a singleuser query, and up to 85.48% with up to 5 user queries. Second, we can generatea non-trivial functional unit test consistent with the user intent within anaverage of 1.69 user queries for 90.40% of the examples for this dataset.

Quick Read (beta)

loading the full paper ...