JuriBERT: A Masked-Language Model Adaptation for French Legal Text

  • 2021-10-04 14:51:24
  • Stella Douka, Hadi Abdine, Michalis Vazirgiannis, Rajaa El Hamdani, David Restrepo Amariles
  • 0

Abstract

Language models have proven to be very useful when adapted to specificdomains. Nonetheless, little research has been done on the adaptation ofdomain-specific BERT models in the French language. In this paper, we focus oncreating a language model adapted to French legal text with the goal of helpinglaw professionals. We conclude that some specific tasks do not benefit fromgeneric language models pre-trained on large amounts of data. We explore theuse of smaller architectures in domain-specific sub-languages and theirbenefits for French legal text. We prove that domain-specific pre-trainedmodels can perform better than their equivalent generalised ones in the legaldomain. Finally, we release JuriBERT, a new set of BERT models adapted to theFrench legal domain.

 

Quick Read (beta)

loading the full paper ...