Learning to Compute Word Embeddings On the Fly

  • 2018-03-07 16:07:10
  • Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio
  • 0

Abstract

Words in natural language follow a Zipfian distribution whereby some wordsare frequent but most are rare. Learning representations for words in the "longtail" of this distribution requires enormous amounts of data. Representationsof rare words trained directly on end tasks are usually poor, requiring us topre-train embeddings on external data, or treat all rare words asout-of-vocabulary words with a unique representation. We provide a method forpredicting embeddings of rare words on the fly from small amounts of auxiliarydata with a network trained end-to-end for the downstream task. We show thatthis improves results against baselines where embeddings are trained on the endtask for reading comprehension, recognizing textual entailment and languagemodeling.

 

Quick Read (beta)

loading the full paper ...