Multi-emotion sentiment classification is a natural language processing (NLP)problem with valuable use cases on real-world data. We demonstrate thatlarge-scale unsupervised language modeling combined with finetuning offers apractical solution to this task on difficult datasets, including those withlabel class imbalance and domain-specific context. By training anattention-based Transformer network (Vaswani et al. 2017) on 40GB of text(Amazon reviews) (McAuley et al. 2015) and fine-tuning on the training set, ourmodel achieves a 0.69 F1 score on the SemEval Task 1:E-c multi-dimensionalemotion classification problem (Mohammad et al. 2018), based on the Plutchikwheel of emotions (Plutchik 1979). These results are competitive with state ofthe art models, including strong F1 scores on difficult (emotion) categoriessuch as Fear (0.73), Disgust (0.77) and Anger (0.78), as well as competitiveresults on rare categories such as Anticipation (0.42) and Surprise (0.37).Furthermore, we demonstrate our application on a real world text classificationtask. We create a narrowly collected text dataset of real tweets on severaltopics, and show that our finetuned model outperforms general purposecommercially available APIs for sentiment and multidimensional emotionclassification on this dataset by a significant margin. We also perform avariety of additional studies, investigating properties of deep learningarchitectures, datasets and algorithms for achieving practical multidimensionalsentiment classification. Overall, we find that unsupervised language modelingand finetuning is a simple framework for achieving high quality results onreal-world sentiment classification.