We propose a segmental neural language model that combines therepresentational power of neural networks and the structure learning mechanismof Bayesian nonparametrics, and show that it learns to discover semanticallymeaningful units (e.g., morphemes and words) from unsegmented charactersequences. The model generates text as a sequence of segments, where eachsegment is generated either character-by-character from a sequence model or asa single draw from a lexical memory that stores multi-character units. Itsparameters are fit to maximize the marginal likelihood of the training data,summing over all segmentations of the input, and its hyperparameters arelikewise set to optimize held-out marginal likelihood. To prevent the modelfrom overusing the lexical memory, which leads to poor generalization and badsegmentation, we introduce a differentiable regularizer that penalizes based onthe expected length of each segment. To our knowledge, this is the firstdemonstration of neural networks that have predictive distributions better thanLSTM language models and also infer a segmentation into word-like units thatare competitive with the best existing word discovery models.