Abstract
Technology-assisted review (TAR) refers to iterative active learningworkflows for document review in high recall retrieval (HRR) tasks. TARresearch and most commercial TAR software have applied linear models such aslogistic regression or support vector machines to lexical features.Transformer-based models with supervised tuning have been found to improveeffectiveness on many text classification tasks, suggesting their use in TAR.We indeed find that the pre-trained BERT model reduces review volume by 30% inTAR workflows simulated on the RCV1-v2 newswire collection. In contrast, wefind that linear models outperform BERT for simulated legal discovery topics onthe Jeb Bush e-mail collection. This suggests the match between transformerpre-training corpora and the task domain is more important than generallyappreciated. Additionally, we show that just-right language model fine-tuningon the task collection before starting active learning is critical. Both toolittle or too much fine-tuning results in performance worse than that of linearmodels, even for RCV1-v2.