Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks

Abstract

Semantic text classification requires the understanding of the contextualsignificance of specific tokens rather than surface-level patterns or keywords(as in rule-based or statistical text classification), making large languagemodels (LLMs) well-suited for this task. However, semantic classificationapplications in industry, like customer intent detection or semantic rolelabeling, tend to be highly specialized. They require annotation by domainexperts in contrast to general-purpose corpora for pretraining. Further, theytypically require high inference throughputs which limits the model size fromlatency and cost perspectives. Thus, for a range of specialized classificationtasks, the preferred solution is to develop customized classifiers byfinetuning smaller language models (e.g., mini-encoders, small languagemodels). In this work, we develop a token-driven sparse finetuning strategy to adaptsmall language models to specialized classification tasks. We identify andfinetune a small sensitive subset of model parameters by leveragingtask-specific token constructs in the finetuning dataset, while leaving most ofthe pretrained weights unchanged. Unlike adapter approaches such as low rankadaptation (LoRA), we do not introduce additional parameters to the model. Ourapproach identifies highly relevant semantic tokens (case study in theAppendix) and outperforms end-to-end finetuning, LoRA, layer selection, andprefix tuning on five diverse semantic classification tasks. We achieve greaterstability and half the training costs vs. end-to-end finetuning.

Quick Read (beta)

loading the full paper ...