Neural Random Projections for Language Modelling

Abstract

Neural network-based language models deal with data sparsity problems bymapping the large discrete space of words into a smaller continuous space ofreal-valued vectors. By learning distributed vector representations for words,each training sample informs the neural network model about a combinatorialnumber of other patterns. We exploit the sparsity in natural language evenfurther by encoding each unique input word using a reduced sparse randomrepresentation. In this paper, we propose an encoder for discrete inputs thatuses random projections to allow for the learning of language models usingsignificantly smaller parameter spaces when compared with similar neuralnetwork architectures. Furthermore, random projections also eliminate thedependency between a neural network architecture and the size of apre-established dictionary. We investigate the properties of our encodingmechanism empirically, by evaluating its performance on the widely used PennTreebank corpus, using several configurations of baseline feedforward neuralnetwork models. We show that guaranteeing approximately equidistant innerproducts between representations of unique discrete inputs is enough to providethe neural network model with enough information to learn useful distributedrepresentations for these inputs. By not requiring prior enumeration of thelexicon, random projections allow us to face the dynamic and open character ofnatural languages.

Quick Read (beta)

loading the full paper ...