Abstract
Autoformalization, the task of automatically translating natural languagedescriptions into a formal language, poses a significant challenge acrossvarious domains, especially in mathematics. Recent advancements in largelanguage models (LLMs) have unveiled their promising capabilities to formalizeeven competition-level math problems. However, we observe a considerablediscrepancy between pass@1 and pass@k accuracies in LLM-generatedformalizations. To address this gap, we introduce a novel framework that scoresand selects the best result from k autoformalization candidates based on twocomplementary self-consistency methods: symbolic equivalence and semanticconsistency. Elaborately, symbolic equivalence identifies the logicalhomogeneity among autoformalization candidates using automated theorem provers,and semantic consistency evaluates the preservation of the original meaning byinformalizing the candidates and computing the similarity between theembeddings of the original and informalized texts. Our extensive experiments onthe MATH and miniF2F datasets demonstrate that our approach significantlyenhances autoformalization accuracy, achieving up to 0.22-1.35x relativeimprovements across various LLMs and baseline methods.