Deconstructing and reconstructing word embedding algorithms

Abstract

Uncontextualized word embeddings are reliable feature representations ofwords used to obtain high quality results for various NLP applications. Giventhe historical success of word embeddings in NLP, we propose a retrospective onsome of the most well-known word embedding algorithms. In this work, wedeconstruct Word2vec, GloVe, and others, into a common form, unveiling some ofthe necessary and sufficient conditions required for making performant wordembeddings. We find that each algorithm: (1) fits vector-covector dot productsto approximate pointwise mutual information (PMI); and, (2) modulates the lossgradient to balance weak and strong signals. We demonstrate that these twoalgorithmic features are sufficient conditions to construct a novel wordembedding algorithm, Hilbert-MLE. We find that its embeddings obtain equivalentor better performance against other algorithms across 17 intrinsic andextrinsic datasets.

Quick Read (beta)

loading the full paper ...