A thorough benchmark of automatic text classification: From traditional approaches to large language models

Abstract

Automatic text classification (ATC) has experienced remarkable advancementsin the past decade, best exemplified by recent small and large language models(SLMs and LLMs), leveraged by Transformer architectures. Despite recenteffectiveness improvements, a comprehensive cost-benefit analysis investigatingwhether the effectiveness gains of these recent approaches compensate theirmuch higher costs when compared to more traditional text classificationapproaches such as SVMs and Logistic Regression is still missing in theliterature. In this context, this work's main contributions are twofold: (i) weprovide a scientifically sound comparative analysis of the cost-benefit oftwelve traditional and recent ATC solutions including five open LLMs, and (ii)a large benchmark comprising {22 datasets}, including sentiment analysis andtopic classification, with their (train-validation-test) partitions based onfolded cross-validation procedures, along with documentation, and code. Therelease of code, data, and documentation enables the community to replicateexperiments and advance the field in a more scientifically sound manner. Ourcomparative experimental results indicate that LLMs outperform traditionalapproaches (up to 26%-7.1% on average) and SLMs (up to 4.9%-1.9% on average) interms of effectiveness. However, LLMs incur significantly higher computationalcosts due to fine-tuning, being, on average 590x and 8.5x slower thantraditional methods and SLMs, respectively. Results suggests the followingrecommendations: (1) LLMs for applications that require the best possibleeffectiveness and can afford the costs; (2) traditional methods such asLogistic Regression and SVM for resource-limited applications or those thatcannot afford the cost of tuning large LLMs; and (3) SLMs like Roberta fornear-optimal effectiveness-efficiency trade-off.

Quick Read (beta)

loading the full paper ...