Learning Performance-Improving Code Edits

Abstract

With the decline of Moore's law, optimizing program performance has become amajor focus of software research. However, high-level optimizations such as APIand algorithm changes remain elusive due to the difficulty of understanding thesemantics of code. Simultaneously, pretrained large language models (LLMs) havedemonstrated strong capabilities at solving a wide range of programming tasks.To that end, we introduce a framework for adapting LLMs to high-level programoptimization. First, we curate a dataset of performance-improving edits made byhuman programmers of over 77,000 competitive C++ programming submission pairs,accompanied by extensive unit tests. A major challenge is the significantvariability of measuring performance on commodity hardware, which can lead tospurious "improvements." To isolate and reliably evaluate the impact of programoptimizations, we design an environment based on the gem5 full systemsimulator, the de facto simulator used in academia and industry. Next, wepropose a broad range of adaptation strategies for code optimization; forprompting, these include retrieval-based few-shot prompting andchain-of-thought, and for finetuning, these include performance-conditionedgeneration and synthetic data augmentation based on self-play. A combination ofthese techniques achieves a mean speedup of 6.86 with eight generations, higherthan average optimizations from individual programmers (3.66). Using ourmodel's fastest generations, we set a new upper limit on the fastest speeduppossible for our dataset at 9.64 compared to using the fastest humansubmissions available (9.56).

Quick Read (beta)

loading the full paper ...