Normalizing flows have been shown to be a powerful class of generative modelsfor continuous random variables, giving both strong performance and thepotential for non-autoregressive generation. These benefits are also desiredwhen modeling discrete random variables such as text, but directly applyingnormalizing flows to discrete sequences poses significant additionalchallenges. We propose a generative model which jointly learns a normalizingflow-based distribution in the latent space and a stochastic mapping to anobserved discrete space. In this setting, we find that it is crucial for theflow-based distribution to be highly multimodal. To capture this property, wepropose several normalizing flow architectures to maximize model flexibility.Experiments consider common discrete sequence tasks of character-level languagemodeling and polyphonic music generation. Our results indicate that anautoregressive flow-based model can match the performance of a comparableautoregressive baseline, and a non-autoregressive flow-based model can improvegeneration speed with a penalty to performance.