Can Unconditional Language Models Recover Arbitrary Sentences?

Abstract

Neural network-based generative language models like ELMo and BERT can workeffectively as general purpose sentence encoders in text classification withoutfurther fine-tuning. Is it possible to adapt them in a similar way for use asgeneral-purpose decoders? For this to be possible, it would need to be the casethat for any target sentence of interest, there is some continuousrepresentation that can be passed to the language model to cause it toreproduce that sentence. We set aside the difficult problem of designing anencoder that can produce such representations and, instead, ask directlywhether such representations exist at all. To do this, we introduce a pair ofeffective, complementary methods for feeding representations into pretrainedunconditional language models and a corresponding set of methods to mapsentences into and out of this representation space, the reparametrizedsentence space. We then investigate the conditions under which a language modelcan be made to generate a sentence through the identification of a point insuch a space and find that it is possible to recover arbitrary sentences nearlyperfectly with language models and representations of moderate size withoutmodifying any model parameters.

Quick Read (beta)

loading the full paper ...