When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting

Abstract

Since the seminal work of Mikolov et al. (2013a) and Bojanowski et al.(2017), word representations of shallow log-bilinear language models have foundtheir way into many NLP applications. Mikolov et al. (2018) introduced apositional log-bilinear language model, which has characteristics of anattention-based language model and which has reached state-of-the-artperformance on the intrinsic word analogy task. However, the positional modelhas never been evaluated on qualitative criteria or extrinsic tasks and itsspeed is impractical. We outline the similarities between the attention mechanism and thepositional model, and we propose a constrained positional model, which adaptsthe sparse attention mechanism of Dai et al. (2018). We evaluate the positionaland constrained positional models on three novel qualitative criteria and onthe extrinsic language modeling task of Botha and Blunsom (2014). We show that the positional and constrained positional models containinterpretable information about word order and outperform the subword model ofBojanowski et al. (2017) on language modeling. We also show that theconstrained positional model outperforms the positional model on languagemodeling and is twice as fast.

Quick Read (beta)

loading the full paper ...