Abstract
Effective reasoning is crucial to solving complex mathematical problems.Recent large language models (LLMs) have boosted performance by scalingtest-time computation through long chain-of-thought reasoning. However,transformer-based models are inherently limited in extending context length dueto their quadratic computational complexity and linear memory requirements. Inthis paper, we introduce a novel hybrid linear RNN reasoning model, M1, builton the Mamba architecture, which allows memory-efficient inference. Ourapproach leverages a distillation process from existing reasoning models and isfurther enhanced through RL training. Experimental results on the AIME and MATHbenchmarks show that M1 not only outperforms previous linear RNN models butalso matches the performance of state-of-the-art Deepseek R1 distilledreasoning models at a similar scale. We also compare our generation speed witha highly performant general purpose inference engine, vLLM, and observe morethan a 3x speedup compared to a same size transformer. With throughput speedup,we are able to achieve higher accuracy compared to DeepSeek R1 distilledtransformer reasoning models under a fixed generation time budget usingself-consistency voting. Overall, we introduce a hybrid Mamba reasoning modeland provide a more effective approach to scaling test-time generation usingself-consistency or long chain of thought reasoning.