Counting in Language with RNNs

  • 2018-10-31 14:07:15
  • Heng xin Fun, Sergiy V Bokhnyak, Francesco Saverio Zuppichini
In this paper we examine a possible reason for the LSTM outperforming the GRUon language modeling and more specifically machine translation. We hypothesizethat this has to do with counting. This is a consistent theme across theliterature of long term dependence, counting, and language modeling for RNNs.Using the simplified forms of language -- Context-Free and Context-SensitiveLanguages -- we show how exactly the LSTM performs its counting based on theircell states during inference and why the GRU cannot perform as well.


