Abstract
We challenge the prevailing assumption that complex reasoning in largelanguage models (LLMs) necessitates massive training data. We demonstrate thatsophisticated mathematical reasoning can emerge with only a few examples.Specifically, through simple supervised fine-tuning, our model, LIMO, achieves63.3\% accuracy on AIME24 and 95.6\% on MATH500, surpassing previous fine-tunedmodels (6.5\% on AIME24, 59.2\% on MATH500) while using only 1\% of thetraining data required by prior approaches. Furthermore, LIMO exhibits strongout-of-distribution generalization, achieving a 45.8\% absolute improvementacross diverse benchmarks, outperforming models trained on 100x more data.Synthesizing these findings, we propose the Less-Is-More Reasoning Hypothesis(LIMO Hypothesis): In foundation models where domain knowledge has beencomprehensively encoded during pre-training, sophisticated reasoning can emergethrough minimal but strategically designed demonstrations of cognitiveprocesses. This hypothesis suggests that the threshold for eliciting complexreasoning is not dictated by task complexity but rather by two key factors: (1)the completeness of the model's pre-trained knowledge base and (2) theeffectiveness of post-training examples in serving as "cognitive templates"that guide reasoning.