DSG-KD: Knowledge Distillation from Domain-Specific to General Language Models

Abstract

The use of pre-trained language models fine-tuned to address specificdownstream tasks is a common approach in natural language processing (NLP).However, acquiring domain-specific knowledge via fine-tuning is challenging.Traditional methods involve pretraining language models using vast amounts ofdomain-specific data before fine-tuning for particular tasks. This studyinvestigates emergency/non-emergency classification tasks based on electronicmedical record (EMR) data obtained from pediatric emergency departments (PEDs)in Korea. Our findings reveal that existing domain-specific pre-trainedlanguage models underperform compared to general language models in handlingN-lingual free-text data characteristics of non-English-speaking regions. Toaddress these limitations, we propose a domain knowledge transfer methodologythat leverages knowledge distillation to infuse general language models withdomain-specific knowledge via fine-tuning. This study demonstrates theeffective transfer of specialized knowledge between models by defining ageneral language model as the student model and a domain-specific pre-trainedmodel as the teacher model. In particular, we address the complexities of EMRdata obtained from PEDs in non-English-speaking regions, such as Korea, anddemonstrate that the proposed method enhances classification performance insuch contexts. The proposed methodology not only outperforms baseline models onKorean PED EMR data, but also promises broader applicability in variousprofessional and technical domains. In future works, we intend to extend thismethodology to include diverse non-English-speaking regions and addressadditional downstream tasks, with the aim of developing advanced modelarchitectures using state-of-the-art KD techniques. The code is available inhttps://github.com/JoSangYeon/DSG-KD.

Quick Read (beta)

loading the full paper ...