Abstract
Training large language models requires vast amounts of data, posing achallenge for less widely spoken languages like Norwegian and even more so fortruly low-resource languages like S\'ami. To address this issue, we present anovel three-stage continual training approach. We also experiment withcombining causal and masked language modeling to get more flexible models.Based on our findings, we train, evaluate, and openly release a new largegenerative language model for Norwegian Bokm\r{a}l, Nynorsk, and NorthernS\'ami with 11.4 billion parameters: NorMistral-11B.
Quick Read (beta)
loading the full paper ...