Finnish Language Modeling with Deep Transformer Models

  • 2020-03-27 10:02:24
  • Abhilash Jain, Aku Ruohe, Stig-Arne Grönroos, Mikko Kurimo
  • 0

Abstract

Transformers have recently taken the center stage in language modeling afterLSTM's were considered the dominant model architecture for a long time. In thisproject, we investigate the performance of the Transformer architectures-BERTand Transformer-XL for the language modeling task. We use a sub-word modelsetting with the Finnish language and compare it to the previous State of theart (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, whichis the first such measure achieved as far as we know. Transformer-XL improvesupon the perplexity score to 73.58 which is 27\% better than the LSTM model.

 

Quick Read (beta)

loading the full paper ...