Abstract
C-to-Rust transpilation is essential for modernizing legacy C code whileenhancing safety and interoperability with modern Rust ecosystems. However, nodataset currently exists for evaluating whether a system can transpile C intosafe Rust that passes a set of test cases. We introduce CRUST-Bench, a datasetof 100 C repositories, each paired with manually-written interfaces in safeRust as well as test cases that can be used to validate correctness of thetranspilation. By considering entire repositories rather than isolatedfunctions, CRUST-Bench captures the challenges of translating complex projectswith dependencies across multiple files. The provided Rust interfaces provideexplicit specifications that ensure adherence to idiomatic, memory-safe Rustpatterns, while the accompanying test cases enforce functional correctness. Weevaluate state-of-the-art large language models (LLMs) on this task and findthat safe and idiomatic Rust generation is still a challenging problem forvarious state-of-the-art methods and techniques. We also provide insights intothe errors LLMs usually make in transpiling code from C to safe Rust. The bestperforming model, OpenAI o1, is able to solve only 15 tasks in a single-shotsetting. Improvements on CRUST-Bench would lead to improved transpilationsystems that can reason about complex scenarios and help in migrating legacycodebases from C into languages like Rust that ensure memory safety. You canfind the dataset and code at https://github.com/anirudhkhatry/CRUST-bench.