FlauBERT: Unsupervised Language Model Pre-training for French

Abstract

Language models have become a key step to achieve state-of-the art results inmany different Natural Language Processing (NLP) tasks. Leveraging the hugeamount of unlabeled texts nowadays available, they provide an efficient way topre-train continuous word representations that can be fine-tuned for adownstream task, along with their contextualization at the sentence level. Thishas been widely demonstrated for English using contextualized representations(Dai and Le, 2015; Peters et al., 2018; Howard and Ruder, 2018; Radford et al.,2018; Devlin et al., 2019; Yang et al., 2019b). In this paper, we introduce andshare FlauBERT, a model learned on a very large and heterogeneous Frenchcorpus. Models of different sizes are trained using the new CNRS (FrenchNational Centre for Scientific Research) Jean Zay supercomputer. We apply ourFrench language models to diverse NLP tasks (text classification, paraphrasing,natural language inference, parsing, word sense disambiguation) and show thatmost of the time they outperform other pre-training approaches. Differentversions of FlauBERT as well as a unified evaluation protocol for thedownstream tasks, called FLUE (French Language Understanding Evaluation), areshared to the research community for further reproducible experiments in FrenchNLP.

Quick Read (beta)

loading the full paper ...