In the last decade, the field of Neural Language Modelling has witnessedenormous changes, with the development of novel models through the use ofTransformer architectures. However, even these models struggle to model longsequences due to memory constraints and increasing computational complexity.Coreference annotations over the training data can provide context far beyondthe modelling limitations of such language models. In this paper we present anextension over the Transformer-block architecture used in neural languagemodels, specifically in GPT2, in order to incorporate entity annotations duringtraining. Our model, GPT2E, extends the Transformer layers architecture of GPT2to Entity-Transformers, an architecture designed to handle coreferenceinformation when present. To that end, we achieve richer representations forentity mentions, with insignificant training cost. We show the comparativemodel performance between GPT2 and GPT2E in terms of Perplexity on the CoNLL2012 and LAMBADA datasets as well as the key differences in the entityrepresentations and their effects in downstream tasks such as Named EntityRecognition. Furthermore, our approach can be adopted by the majority ofTransformer-based language models.