Abstract
Recurrent neural networks are nowadays successfully used in an abundance ofapplications, going from text, speech and image processing to recommendersystems. Backpropagation through time is the algorithm that is commonly used totrain these networks on specific tasks. Many deep learning frameworks havetheir own implementation of training and sampling procedures for recurrentneural networks, while there are in fact multiple other possibilities to choosefrom and other parameters to tune. In existing literature this is very oftenoverlooked or ignored. In this paper we therefore give an overview of possibletraining and sampling schemes for character-level recurrent neural networks tosolve the task of predicting the next token in a given sequence. We test thesedifferent schemes on a variety of datasets, neural network architectures andparameter settings, and formulate a number of take-home recommendations. Thechoice of training and sampling scheme turns out to be subject to a number oftrade-offs, such as training stability, sampling time, model performance andimplementation effort, but is largely independent of the data. Perhaps the mostsurprising result is that transferring hidden states for correctly initializingthe model on subsequences often leads to unstable training behavior dependingon the dataset.