Abstract
While knowledge distillation has become a mature field for compressing largelanguage models (LLMs) into smaller ones by aligning their outputs or internalrepresentations, the distillation of LLM-based agents, which involve planning,memory, and tool use, remains relatively underexplored. Existing agentdistillation methods typically replay full teacher trajectories or imitatestep-by-step teacher tool usage, but they often struggle to train studentagents to dynamically plan and act in novel environments. We proposeAgentDistill, a novel, training-free agent distillation framework that enablesefficient and scalable knowledge transfer via direct reuse ofModel-Context-Protocols (MCPs), which are structured and reusable task-solvingmodules autonomously generated by teacher agents. The reuse of these distilledMCPs enables student agents to generalize their capabilities across domains andsolve new problems with minimal supervision or human intervention. Experimentson biomedical and mathematical benchmarks demonstrate that our distilledstudent agents, built on small language models, can achieve performancecomparable to advanced systems using large LLMs such as OctoTools (GPT-4o),highlighting the effectiveness of our framework in building scalable andcost-efficient intelligent agents.