Abstract
We show that autoregressive language models can learn to infill text after weapply a straightforward transformation to the dataset, which simply moves aspan of text from the middle of a document to its end. While this dataaugmentation has garnered much interest in recent years, we provide extensiveevidence that training models with a large fraction of data transformed in thisway does not harm the original left-to-right generative capability, as measuredby perplexity and sampling evaluations across a wide range of scales. Given theusefulness, simplicity, and efficiency of training models to fill-in-the-middle(FIM), we suggest that future autoregressive language models be trained withFIM by default. To this end, we run a series of ablations on keyhyperparameters, such as the data transformation frequency, the structure ofthe transformation, and the method of selecting the infill span. We use theseablations to prescribe strong default settings and best practices to train FIMmodels. We have released our best infilling model trained with best practicesin our API, and release our infilling benchmarks to aid future research.