Customizing Contextualized Language Models forLegal Document Reviews

Abstract

Inspired by the inductive transfer learning on computer vision, many effortshave been made to train contextualized language models that boost theperformance of natural language processing tasks. These models are mostlytrained on large general-domain corpora such as news, books, orWikipedia.Although these pre-trained generic language models well perceive thesemantic and syntactic essence of a language structure, exploiting them in areal-world domain-specific scenario still needs some practical considerationsto be taken into account such as token distribution shifts, inference time,memory, and their simultaneous proficiency in multiple tasks. In this paper, wefocus on the legal domain and present how different language model strained ongeneral-domain corpora can be best customized for multiple legal documentreviewing tasks. We compare their efficiencies with respect to taskperformances and present practical considerations.

Quick Read (beta)

loading the full paper ...