MambaByte: Token-free Selective State Space Model

Abstract

Token-free language models learn directly from raw bytes and remove the biasof subword tokenization. Operating on bytes, however, results in significantlylonger sequences, and standard autoregressive Transformers scale poorly in suchsettings. We experiment with MambaByte, a token-free adaptation of the Mambastate space model, trained autoregressively on byte sequences. Our experimentsindicate the computational efficiency of MambaByte compared to other byte-levelmodels. We also find MambaByte to be competitive with and even outperformstate-of-the-art subword Transformers. Furthermore, owing to linear scaling inlength, MambaByte benefits from fast inference compared to Transformers. Ourfindings establish the viability of MambaByte in enabling token-free languagemodeling.

Quick Read (beta)

loading the full paper ...