Multilayer switch networks are proposed as artificial generators ofhigh-dimensional discrete data (e.g., binary vectors, categorical data, naturallanguage, network log files, and discrete-valued time series). Unlikedeconvolution networks which generate continuous-valued data and which consistof upsampling filters and reverse pooling layers, multilayer switch networksare composed of adaptive switches which model conditional distributions ofdiscrete random variables. An interpretable, statistical framework isintroduced for training these nonlinear networks based on a maximum-likelihoodobjective function. To learn network parameters, stochastic gradient descent isapplied to the objective. This direct optimization is stable until convergence,and does not involve back-propagation over separate encoder and decodernetworks, or adversarial training of dueling networks. While training remainstractable for moderately sized networks, Markov-chain Monte Carlo (MCMC)approximations of gradients are derived for deep networks which contain latentvariables. The statistical framework is evaluated on synthetic data,high-dimensional binary data of handwritten digits, and web-crawled naturallanguage data. Aspects of the model's framework such as interpretability,computational complexity, and generalization ability are discussed.