Abstract
Sparse Mixture-of-Experts (SMoE) models represent a significant advancementin large language model (LLM) development through their efficient parameterutilization. These models achieve substantial performance improvements atreduced inference costs. However, the deployment of SMoE models facesconstraints from extensive memory requirements of expert components inresource-limited environments. To address these limitations, this paperintroduces Hierarchical Clustering for Sparsely activated Mixture of Experts(HC-SMoE), a task-agnostic expert merging framework for parameter reductionwithout retraining. HC-SMoE introduces a novel hierarchical clustering approachbased on expert outputs to ensure merging robustness independent of routingdecisions. The proposed output-based clustering method enables effectivecapture of functional relationships between experts for large-scalearchitectures. We provide theoretical analysis and comprehensive evaluationsacross multiple zero-shot language tasks to demonstrate HC-SMoE's effectivenessin state-of-the-art models including Qwen and Mixtral. The experimental resultsvalidate HC-SMoE's superior performance and practical applicability forreal-world deployments.