Abstract
Large Language Models (LLMs) have revolutionised the field of NaturalLanguage Processing (NLP) and have achieved state-of-the-art performance inpractically every task in this field. However, the prevalent approach used intext generation, Causal Language Modelling (CLM), which generates textsequentially from left to right, inherently limits the freedom of the model,which does not decide when and where each token is generated. In contrast,Masked Language Modelling (MLM), primarily used for language understandingtasks, can generate tokens anywhere in the text and any order. This paperconducts an extensive comparison of MLM and CLM approaches for text generationtasks. To do so, we pre-train several language models of comparable sizes onthree different datasets, namely 1) medical discharge summaries, 2) movie plotsynopses, and 3) authorship verification datasets. To assess the quality of thegenerations, we first employ quantitative metrics and then perform aqualitative human evaluation to analyse coherence and grammatical correctness.In addition, we evaluate the usefulness of the generated texts by using them inthree different downstream tasks: 1) Entity Recognition, 2) TextClassification, and 3) Authorship Verification. The results show that MLMconsistently outperforms CLM in text generation across all datasets, withhigher quantitative scores and better coherence in the generated text. Thestudy also finds \textit{no strong correlation} between the quality of thegenerated text and the performance of the models in the downstream tasks. Withthis study, we show that MLM for text generation has great potential for futureresearch and provides direction for future studies in this area.