Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Abstract

Large language models (LLMs) have demonstrated strong performance across awide range of programming tasks, yet their potential for code optimizationremains underexplored. This work investigates whether LLMs can optimize theperformance of assembly code, where fine-grained control over execution enablesimprovements that are difficult to express in high-level languages. We presenta reinforcement learning framework that trains LLMs using Proximal PolicyOptimization (PPO), guided by a reward function that considers both functionalcorrectness, validated through test cases, and execution performance relativeto the industry-standard compiler gcc -O3. To support this study, we introducea benchmark of 8,072 real-world programs. Our model, Qwen2.5-Coder-7B-PPO,achieves 96.0% test pass rates and an average speedup of 1.47x over the gcc -O3baseline, outperforming all 20 other models evaluated, includingClaude-3.7-sonnet. These results indicate that reinforcement learning canunlock the potential of LLMs to serve as effective optimizers for assembly codeperformance.

Quick Read (beta)

loading the full paper ...