Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

  • 2018-02-09 01:15:08
  • Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen
  • 0

Abstract

We formulate language modeling as a matrix factorization problem, and showthat the expressiveness of Softmax-based models (including the majority ofneural language models) is limited by a Softmax bottleneck. Given that naturallanguage is highly context-dependent, this further implies that in practiceSoftmax with distributed word embeddings does not have enough capacity to modelnatural language. We propose a simple and effective method to address thisissue, and improve the state-of-the-art perplexities on Penn Treebank andWikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels onthe large-scale 1B Word dataset, outperforming the baseline by over 5.6 pointsin perplexity.

 

Quick Read (beta)

loading the full paper ...