Deep contextualized word representations

Abstract

We introduce a new type of deep contextualized word representation thatmodels both (1) complex characteristics of word use (e.g., syntax andsemantics), and (2) how these uses vary across linguistic contexts (i.e., tomodel polysemy). Our word vectors are learned functions of the internal statesof a deep bidirectional language model (biLM), which is pre-trained on a largetext corpus. We show that these representations can be easily added to existingmodels and significantly improve the state of the art across six challengingNLP problems, including question answering, textual entailment and sentimentanalysis. We also present an analysis showing that exposing the deep internalsof the pre-trained network is crucial, allowing downstream models to mixdifferent types of semi-supervision signals.

Quick Read (beta)

loading the full paper ...