BenTo: Benchmark Task Reduction with In-Context Transferability

  • 2024-10-18 04:15:21
  • Hongyu Zhao, Ming Li, Lichao Sun, Tianyi Zhou
Evaluating large language models (LLMs) is costly: it requires the generationand examination of LLM outputs on a large-scale benchmark of various tasks.This paper investigates how to efficiently reduce the tasks used to benchmarkLLMs without affecting the evaluation quality. Our study reveals that tasktransferability and relevance provide critical information to identify the mostrepresentative subset of tasks via optimizing a facility location function. Wepropose a practically efficient metric for estimating the transferabilitybetween two tasks via in-context learning (ICL). By analyzing the pairwisetransferability, we can reduce tasks in a modern LLM benchmark (e.g., MMLU orFLAN) to 5% while inducing only a <4% difference to the evaluation on theoriginal benchmark. Compared to prior works, our method is training-free,gradient-free, and highly efficient requiring ICL only.


