Unsupervised Pre-training for Natural Language Generation: A Literature Review

Abstract

Recently, unsupervised pre-training is gaining increasing popularity in therealm of computational linguistics, thanks to its surprising success inadvancing natural language understanding (NLU) and the potential to effectivelyexploit large-scale unlabelled corpus. However, regardless of the success inNLU, the power of unsupervised pre-training is only partially excavated when itcomes to natural language generation (NLG). The major obstacle stems from anidiosyncratic nature of NLG: Texts are usually generated based on certaincontext, which may vary with the target applications. As a result, it isintractable to design a universal architecture for pre-training as in NLUscenarios. Moreover, retaining the knowledge learned from pre-training whenlearning on the target task is also a non-trivial problem. This reviewsummarizes the recent efforts to enhance NLG systems with unsupervisedpre-training, with a special focus on the methods to catalyse the integrationof pre-trained models into downstream tasks. They are classified intoarchitecture-based methods and strategy-based methods, based on their way ofhandling the above obstacle. Discussions are also provided to give furtherinsights into the relationship between these two lines of work, someinformative empirical phenomenons, as well as some possible directions wherefuture work can be devoted to.

Quick Read (beta)

loading the full paper ...