Configurable Foundation Models: Building LLMs from a Modular Perspective

  • 2024-09-04 18:01:02
  • Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun
  • 0

Abstract

Advancements in LLMs have recently unveiled challenges tied to computationalefficiency and continual scalability due to their requirements of hugeparameters, making the applications and evolution of these models on deviceswith limited computation resources and scenarios requiring various abilitiesincreasingly cumbersome. Inspired by modularity within the human brain, thereis a growing tendency to decompose LLMs into numerous functional modules,allowing for inference with part of modules and dynamic assembly of modules totackle complex tasks, such as mixture-of-experts. To highlight the inherentefficiency and composability of the modular approach, we coin the term brick torepresent each functional module, designating the modularized structure asconfigurable foundation models. In this paper, we offer a comprehensiveoverview and investigation of the construction, utilization, and limitation ofconfigurable foundation models. We first formalize modules into emergent bricks- functional neuron partitions that emerge during the pre-training phase, andcustomized bricks - bricks constructed via additional post-training to improvethe capabilities and knowledge of LLMs. Based on diverse functional bricks, wefurther present four brick-oriented operations: retrieval and routing, merging,updating, and growing. These operations allow for dynamic configuration of LLMsbased on instructions to handle complex tasks. To verify our perspective, weconduct an empirical analysis on widely-used LLMs. We find that the FFN layersfollow modular patterns with functional specialization of neurons andfunctional neuron partitions. Finally, we highlight several open issues anddirections for future research. Overall, this paper aims to offer a freshmodular perspective on existing LLM research and inspire the future creation ofmore efficient and scalable foundational models.

 

Quick Read (beta)

loading the full paper ...