Abstract
Reinforcement learning (RL) has become a key component in training largelanguage reasoning models (LLMs). However, recent studies questions itseffectiveness in improving multi-step reasoning-particularly on hard problems.To address this challenge, we propose a simple yet effective strategy viaQuestion Augmentation: introduce partial solutions during training to reduceproblem difficulty and provide more informative learning signals. Our method,QuestA, when applied during RL training on math reasoning tasks, not onlyimproves pass@1 but also pass@k-particularly on problems where standard RLstruggles to make progress. This enables continual improvement over strongopen-source models such as DeepScaleR and OpenMath Nemotron, further enhancingtheir reasoning capabilities. We achieve new state-of-the-art results on mathbenchmarks using 1.5B-parameter models: 67.1% (+5.3%) on AIME24, 59.5% (+10.0%)on AIME25, and 35.5% (+4.0%) on HMMT25. Further, we provide theoreticalexplanations that QuestA improves sample efficiency, offering a practical andgeneralizable pathway for expanding reasoning capability through RL.