Abstract
Knowledge-intensive conversations supported by large language models (LLMs)have become one of the most popular and helpful applications that can assistpeople in different aspects. Many current knowledge-intensive applications arecentered on retrieval-augmented generation (RAG) techniques. While manyopen-source RAG frameworks facilitate the development of RAG-basedapplications, they often fall short in handling practical scenarios complicatedby heterogeneous data in topics and formats, conversational context management,and the requirement of low-latency response times. This technical reportpresents a configurable knowledge integrated multi-agent system, KIMAs, toaddress these challenges. KIMAs features a flexible and configurable system forintegrating diverse knowledge sources with 1) context management and queryrewrite mechanisms to improve retrieval accuracy and multi-turn conversationalcoherency, 2) efficient knowledge routing and retrieval, 3) simple buteffective filter and reference generation mechanisms, and 4) optimizedparallelizable multi-agent pipeline execution. Our work provides a scalableframework for advancing the deployment of LLMs in real-world settings. To showhow KIMAs can help developers build knowledge-intensive applications withdifferent scales and emphases, we demonstrate how we configure the system tothree applications already running in practice with reliable performance.