PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

Abstract

Self-supervised pre-training, such as BERT, MASS and BART, has emerged as apowerful technique for natural language understanding and generation. Existingpre-training techniques employ autoencoding and/or autoregressive objectives totrain Transformer-based models by recovering original word tokens fromcorrupted text with some masked tokens. The training goals of existingtechniques are often inconsistent with the goals of many language generationtasks, such as generative question answering and conversational responsegeneration, for producing new text given context. This work presents PALM with a novel scheme that jointly pre-trains anautoencoding and autoregressive language model on a large unlabeled corpus,specifically designed for generating new text conditioned on context. The newscheme alleviates the mismatch introduced by the existing denoising schemebetween pre-training and fine-tuning where generation is more thanreconstructing original text. An extensive set of experiments show that PALMachieves new state-of-the-art results on a variety of language generationbenchmarks covering generative question answering (Rank 1 on the official MARCOleaderboard), abstractive summarization on CNN/DailyMail as well as Gigaword,question generation on SQuAD, and conversational response generation on CornellMovie Dialogues.

Quick Read (beta)

loading the full paper ...