Angular-Based Word Meta-Embedding Learning

Abstract

Ensembling word embeddings to improve distributed word representations hasshown good success for natural language processing tasks in recent years. Theseapproaches either carry out straightforward mathematical operations over a setof vectors or use unsupervised learning to find a lower-dimensionalrepresentation. This work compares meta-embeddings trained for differentlosses, namely loss functions that account for angular distance between thereconstructed embedding and the target and those that account normalizeddistances based on the vector length. We argue that meta-embeddings are betterto treat the ensemble set equally in unsupervised learning as the respectivequality of each embedding is unknown for upstream tasks prior tometa-embedding. We show that normalization methods that account for this suchas cosine and KL-divergence objectives outperform meta-embedding trained onstandard $\ell_1$ and $\ell_2$ loss on \textit{defacto} word similarity andrelatedness datasets and find it outperforms existing meta-learning strategies.

Quick Read (beta)

loading the full paper ...