Densing Law of LLMs

  • 2024-12-06 11:39:27
  • Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Biyuan Lin, Jie Zhou, Zhi Zheng, Xu Han, Zhiyuan Liu, Maosong Sun
  • 0

Abstract

Large Language Models (LLMs) have emerged as a milestone in artificialintelligence, and their performance can improve as the model size increases.However, this scaling brings great challenges to training and inferenceefficiency, particularly for deploying LLMs in resource-constrainedenvironments, and the scaling trend is becoming increasingly unsustainable.This paper introduces the concept of ``\textit{capacity density}'' as a newmetric to evaluate the quality of the LLMs across different scales anddescribes the trend of LLMs in terms of both effectiveness and efficiency. Tocalculate the capacity density of a given target LLM, we first introduce a setof reference models and develop a scaling law to predict the downstreamperformance of these reference models based on their parameter sizes. We thendefine the \textit{effective parameter size} of the target LLM as the parametersize required by a reference model to achieve equivalent performance, andformalize the capacity density as the ratio of the effective parameter size tothe actual parameter size of the target LLM. Capacity density provides aunified framework for assessing both model effectiveness and efficiency. Ourfurther analysis of recent open-source base LLMs reveals an empirical law (thedensing law)that the capacity density of LLMs grows exponentially over time.More specifically, using some widely used benchmarks for evaluation, thecapacity density of LLMs doubles approximately every three months. The lawprovides new perspectives to guide future LLM development, emphasizing theimportance of improving capacity density to achieve optimal results withminimal computational overhead.

 

Quick Read (beta)

loading the full paper ...