How Context Affects Language Models' Factual Predictions

Abstract

When pre-trained on large unsupervised textual corpora, language models areable to store and retrieve factual knowledge to some extent, making it possibleto use them directly for zero-shot cloze-style question answering. However,storing factual knowledge in a fixed number of weights of a language modelclearly has limitations. Previous approaches have successfully provided accessto information outside the model weights using supervised architectures thatcombine an information retrieval system with a machine reading component. Inthis paper, we go a step further and integrate information from a retrievalsystem with a pre-trained language model in a purely unsupervised way. Wereport that augmenting pre-trained language models in this way dramaticallyimproves performance and that the resulting system, despite being unsupervised,is competitive with a supervised machine reading baseline. Furthermore,processing query and context with different segment tokens allows BERT toutilize its Next Sentence Prediction pre-trained classifier to determinewhether the context is relevant or not, substantially improving BERT'szero-shot cloze-style question-answering performance and making its predictionsrobust to noisy contexts.

Quick Read (beta)

loading the full paper ...