RouterBench: A Benchmark for Multi-LLM Routing System

Abstract

As the range of applications for Large Language Models (LLMs) continues togrow, the demand for effective serving solutions becomes increasingly critical.Despite the versatility of LLMs, no single model can optimally address alltasks and applications, particularly when balancing performance with cost. Thislimitation has led to the development of LLM routing systems, which combine thestrengths of various models to overcome the constraints of individual LLMs.Yet, the absence of a standardized benchmark for evaluating the performance ofLLM routers hinders progress in this area. To bridge this gap, we presentRouterBench, a novel evaluation framework designed to systematically assess theefficacy of LLM routing systems, along with a comprehensive dataset comprisingover 405k inference outcomes from representative LLMs to support thedevelopment of routing strategies. We further propose a theoretical frameworkfor LLM routing, and deliver a comparative analysis of various routingapproaches through RouterBench, highlighting their potentials and limitationswithin our evaluation framework. This work not only formalizes and advances thedevelopment of LLM routing systems but also sets a standard for theirassessment, paving the way for more accessible and economically viable LLMdeployments. The code and data are available athttps://github.com/withmartian/routerbench.

Quick Read (beta)

loading the full paper ...