Deep learning (DL) creates impactful advances following a virtuous recipe:model architecture search, creating large training data sets, and scalingcomputation. It is widely believed that growing training sets and models shouldimprove accuracy and result in better products. As DL application domains grow,we would like a deeper understanding of the relationships between training setsize, computational scale, and model accuracy improvements to advance thestate-of-the-art. This paper presents a large scale empirical characterization ofgeneralization error and model size growth as training sets grow. We introducea methodology for this measurement and test four machine learning domains:machine translation, language modeling, image processing, and speechrecognition. Our empirical results show power-law generalization error scalingacross a breadth of factors, resulting in power-law exponents---the "steepness"of the learning curve---yet to be explained by theoretical work. Further, modelimprovements only shift the error but do not appear to affect the power-lawexponent. We also show that model size scales sublinearly with data size. Thesescaling relationships have significant implications on deep learning research,practice, and systems. They can assist model debugging, setting accuracytargets, and decisions about data set growth. They can also guide computingsystem design and underscore the importance of continued computational scaling.