Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Abstract

While large language models (LLMs) excel on generation tasks, theirdecoder-only architecture often limits their potential as embedding models ifno further representation finetuning is applied. Does this contradict theirclaim of generalists? To answer the question, we take a closer look atMixture-of-Experts (MoE) LLMs. Our study shows that the expert routers in MoELLMs can serve as an off-the-shelf embedding model with promising performanceon a diverse class of embedding-focused tasks, without requiring anyfinetuning. Moreover, our extensive analysis shows that the MoE routing weights(RW) is complementary to the hidden state (HS) of LLMs, a widely-usedembedding. Compared to HS, we find that RW is more robust to the choice ofprompts and focuses on high-level semantics. Motivated by the analysis, wepropose MoEE combining RW and HS, which achieves better performance than usingeither separately. Our exploration of their combination and prompting strategyshed several novel insights, e.g., a weighted sum of RW and HS similaritiesoutperforms the similarity on their concatenation. Our experiments areconducted on 6 embedding tasks with 20 datasets from the Massive Text EmbeddingBenchmark (MTEB). The results demonstrate the significant improvement broughtby MoEE to LLM-based embedding without further finetuning.

Quick Read (beta)

loading the full paper ...