Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments

Abstract

This study proposes a method for knowledge distillation (KD) of fine-tunedLarge Language Models (LLMs) into smaller, more efficient, and accurate neuralnetworks. We specifically target the challenge of deploying these models onresource-constrained devices. Our methodology involves training the smallerstudent model (Neural Network) using the prediction probabilities (as softlabels) of the LLM, which serves as a teacher model. This is achieved through aspecialized loss function tailored to learn from the LLM's outputprobabilities, ensuring that the student model closely mimics the teacher'sperformance. To validate the performance of the KD approach, we utilized alarge dataset, 7T, containing 6,684 student-written responses to sciencequestions and three mathematical reasoning datasets with student-writtenresponses graded by human experts. We compared accuracy with state-of-the-art(SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models.Results have shown that the KD approach has 1% and 4% higher scoring accuracythan ANN and TinyBERT and comparable accuracy to the teacher model.Furthermore, the student model size is 0.02M, 10,000 times smaller inparameters and x10 faster in inferencing than the teacher model and TinyBERT,respectively. The significance of this research lies in its potential to makeadvanced AI technologies accessible in typical educational settings,particularly for automatic scoring.

Quick Read (beta)

loading the full paper ...