Encoder-Agnostic Adaptation for Conditional Language Generation

Abstract

Large pretrained language models have changed the way researchers approachdiscriminative natural language understanding tasks, leading to the dominanceof approaches that adapt a pretrained model for arbitrary downstream tasks.However it is an open-question how to use similar techniques for languagegeneration. Early results in the encoder-agnostic setting have been mostlynegative. In this work we explore methods for adapting a pretrained languagemodel to arbitrary conditional input. We observe that pretrained transformermodels are sensitive to large parameter changes during tuning. We thereforepropose an adaptation that directly injects arbitrary conditioning into selfattention, an approach we call pseudo self attention. Through experiments onfour diverse conditional text generation tasks we show that thisencoder-agnostic technique outperforms strong baselines, produces coherentgenerations, and is data efficient.

Quick Read (beta)

loading the full paper ...