Abstract
Transformers have generally supplanted recurrent neural networks as thedominant architecture for both natural language processing tasks and formodelling the effect of predictability on online human language comprehension.However, two recently developed recurrent model architectures, RWKV and Mamba,appear to perform natural language tasks comparably to or better thantransformers of equivalent scale. In this paper, we show that contemporaryrecurrent models are now also able to match - and in some cases, exceed - theperformance of comparably sized transformers at modeling online human languagecomprehension. This suggests that transformer language models are not uniquelysuited to this task, and opens up new directions for debates about the extentto which architectural features of language models make them better or worsemodels of human language comprehension.