An Exploration of Word Embedding Initialization in Deep-Learning Tasks

Abstract

Word embeddings are the interface between the world of discrete units of textprocessing and the continuous, differentiable world of neural networks. In thiswork, we examine various random and pretrained initialization methods forembeddings used in deep networks and their effect on the performance on fourNLP tasks with both recurrent and convolutional architectures. We confirm thatpretrained embeddings are a little better than random initialization,especially considering the speed of learning. On the other hand, we do not seeany significant difference between various methods of random initialization, aslong as the variance is kept reasonably low. High-variance initializationprevents the network to use the space of embeddings and forces it to use otherfree parameters to accomplish the task. We support this hypothesis by observingthe performance in learning lexical relations and by the fact that the networkcan learn to perform reasonably in its task even with fixed random embeddings.

Quick Read (beta)

loading the full paper ...