Current dialogue systems are not very engaging for users, especially whentrained end-to-end without relying on proactive reengaging scripted strategies.Zhang et al. (2018) showed that the engagement level of end-to-end dialoguemodels increases when conditioning them on text personas providing somepersonalized back-story to the model. However, the dataset used in Zhang et al.(2018) is synthetic and of limited size as it contains around 1k differentpersonas. In this paper we introduce a new dataset providing 5 million personasand 700 million persona-based dialogues. Our experiments show that, at thisscale, training using personas still improves the performance of end-to-endsystems. In addition, we show that other tasks benefit from the wide coverageof our dataset by fine-tuning our model on the data from Zhang et al. (2018)and achieving state-of-the-art results.