TinyLLM: Learning a Small Student from Multiple Large Language Models

  • 2024-04-01 02:28:48
  • Yijun Tian, Yikun Han, Xiusi Chen, Wei Wang, Nitesh V. Chawla
  • 0

Abstract

Transferring the reasoning capability from stronger large language models(LLMs) to smaller ones has been quite appealing, as smaller LLMs are moreflexible to deploy with less expense. Among the existing solutions, knowledgedistillation stands out due to its outstanding efficiency and generalization.However, existing methods suffer from several drawbacks, including limitedknowledge diversity and the lack of rich contextual information. To solve theproblems and facilitate the learning of compact language models, we proposeTinyLLM, a new knowledge distillation paradigm to learn a small student LLMfrom multiple large teacher LLMs. In particular, we encourage the student LLMto not only generate the correct answers but also understand the rationalesbehind these answers. Given that different LLMs possess diverse reasoningskills, we guide the student model to assimilate knowledge from various teacherLLMs. We further introduce an in-context example generator and ateacher-forcing Chain-of-Thought strategy to ensure that the rationales areaccurate and grounded in contextually appropriate scenarios. Extensiveexperiments on six datasets across two reasoning tasks demonstrate thesuperiority of our method. Results show that TinyLLM can outperform largeteacher LLMs significantly, despite a considerably smaller model size.

 

Quick Read (beta)

loading the full paper ...