The effectiveness of a language model is influenced by its tokenrepresentations, which must encode contextual information and handle the sameword form having a plurality of meanings (polysemy). Currently, none of thecommon language modelling architectures explicitly model polysemy. We propose alanguage model which not only predicts the next word, but also its sense incontext. We argue that this higher prediction granularity may be useful for endtasks such as assistive writing, and allow for more a precise linking oflanguage models with knowledge bases. We find that multi-sense languagemodelling requires architectures that go beyond standard language models, andhere propose a structured prediction framework that decomposes the task into aword followed by a sense prediction task. To aid sense prediction, we utilise aGraph Attention Network, which encodes definitions and example uses of wordsenses. Overall, we find that multi-sense language modelling is a highlychallenging task, and suggest that future work focus on the creation of moreannotated training datasets.