Abstract
Memorization is a fundamental ability of Transformer-based Large LanguageModels, achieved through learning. In this paper, we propose a paradigm shiftby designing an architecture to memorize text directly, bearing in mind theprinciple that memorization precedes learning. We introduce MeMo, a novelarchitecture for language modeling that explicitly memorizes sequences oftokens in layered associative memories. By design, MeMo offers transparency andthe possibility of model editing, including forgetting texts. We experimentedwith the MeMo architecture, showing the memorization power of the one-layer andthe multi-layer configurations.
Quick Read (beta)
loading the full paper ...