DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Abstract

Long-range sequence processing poses a significant challenge for Transformersdue to their quadratic complexity in input length. A promising alternative isMamba, which demonstrates high performance and achieves Transformer-levelcapabilities while requiring substantially fewer computational resources. Inthis paper we explore the length-generalization capabilities of Mamba, which wefind to be relatively limited. Through a series of visualizations and analyseswe identify that the limitations arise from a restricted effective receptivefield, dictated by the sequence length used during training. To address thisconstraint, we introduce DeciMamba, a context-extension method specificallydesigned for Mamba. This mechanism, built on top of a hidden filteringmechanism embedded within the S6 layer, enables the trained model toextrapolate well even without additional training. Empirical experiments overreal-world long-range NLP tasks show that DeciMamba can extrapolate to contextlengths that are significantly longer than the ones seen during training, whileenjoying faster inference.

Quick Read (beta)

loading the full paper ...