Abstract
Autoformalization is one of the central tasks in formal verification, whileits advancement remains hindered due to the data scarcity and the absenceefficient methods. In this work we propose \textbf{FormaRL}, a simple yetefficient reinforcement learning framework for autoformalization which onlyrequires a small amount of unlabeled data. FormaRL integrates syntax check fromLean compiler and consistency check from large language model to calculate thereward, and adopts GRPO algorithm to update the formalizer. We also curated aproof problem dataset from undergraduate-level math materials, named\textbf{uproof}, in the hope to facilitate the exploration of autoformalizationand theorem proving in advanced math. Experiments show that FormaRL canincrease the pass@1 autoformalization accuracy of Qwen2.5-Coder-7B-Instruct by4 $\sim$ 6x (4.04\% $\to$ 26.15\% on ProofNet and 2.4\% $\to$ 9.6\% on uproof)with merely 859 unlabeled data. And on uproof our method also achieved a strongimprovement in out-of-distribution performance compared to existing open-sourcestate-of-the-art autoformalizers on both pass@1 accuracy (6.2\% $\to$ 9.6\%)and pass@16 accuracy (24.4\% $\to$ 33.6\%). Training code of FormaRL isopen-sourced at https://github.com/THUNLP-MT/FormaRL.