Abstract
Recent research has established a connection between modern Hopfield networks(HNs) and transformer attention heads, with guarantees of exponential storagecapacity. However, these models still face challenges scaling storageefficiently. Inspired by psychological theories of continuous neural resourceallocation in working memory, we propose an approach that compresses largediscrete Hopfield memories into smaller, continuous-time memories. Leveragingcontinuous attention, our new energy function modifies the update rule of HNs,replacing the traditional softmax-based probability mass function with aprobability density, over the continuous memory. This formulation aligns withmodern perspectives on human executive function, offering a principled linkbetween attractor dynamics in working memory and resource-efficient memoryallocation. Our framework maintains competitive performance with HNs whileleveraging a compressed memory, reducing computational costs across syntheticand video datasets.