Assessing the Performance Gap Between Lexical and Semantic Models for Information Retrieval With Formulaic Legal Language

Abstract

Legal passage retrieval is an important task that assists legal practitionersin the time-intensive process of finding relevant precedents to support legalarguments. This study investigates the task of retrieving legal passages orparagraphs from decisions of the Court of Justice of the European Union (CJEU),whose language is highly structured and formulaic, leading to repetitivepatterns. Understanding when lexical or semantic models are more effective athandling the repetitive nature of legal language is key to developing retrievalsystems that are more accurate, efficient, and transparent for specific legaldomains. To this end, we explore when this routinized legal language is bettersuited for retrieval using methods that rely on lexical and statisticalfeatures, such as BM25, or dense retrieval models trained to capture semanticand contextual information. A qualitative and quantitative analysis with threecomplementary metrics shows that both lexical and dense models perform well inscenarios with more repetitive usage of language, whereas BM25 performs betterthan the dense models in more nuanced scenarios where repetition andverbatim~quotes are less prevalent and in longer queries. Our experiments alsoshow that BM25 is a strong baseline, surpassing off-the-shelf dense models in 4out of 7 performance metrics. However, fine-tuning a dense model ondomain-specific data led to improved performance, surpassing BM25 in mostmetrics, and we analyze the effect of the amount of data used in fine-tuning onthe model's performance and temporal robustness. The code, dataset and appendixrelated to this work are available on:https://github.com/larimo/lexsem-legal-ir.

Quick Read (beta)

loading the full paper ...