Categorical distributions are ubiquitous in machine learning, e.g., inclassification, language models, and recommendation systems. They are also atthe core of discrete choice models. However, when the number of possibleoutcomes is very large, using categorical distributions becomes computationallyexpensive, as the complexity scales linearly with the number of outcomes. Toaddress this problem, we propose augment and reduce (A&R), a method toalleviate the computational complexity. A&R uses two ideas: latent variableaugmentation and stochastic variational inference. It maximizes a lower boundon the marginal likelihood of the data. Unlike existing methods which arespecific to softmax, A&R is more general and is amenable to other categoricalmodels, such as multinomial probit. On several large-scale classificationproblems, we show that A&R provides a tighter bound on the marginal likelihoodand has better predictive performance than existing approaches.