Although SGD requires shuffling the training data between epochs, currentlynone of the word-level language modeling systems do this. Naively shuffling allsentences in the training data would not permit the model to learninter-sentence dependencies. Here we present a method that partially shufflesthe training data between epochs. This method makes each batch random, whilekeeping most sentence ordering intact. It achieves new state of the art resultson word-level language modeling on both the Penn Treebank and WikiText-2datasets.