Abstract
Large Language Model (LLM) agents have shown great potential in addressingreal-world data science problems. LLM-driven data science agents promise toautomate the entire machine learning pipeline, yet their real-worldeffectiveness remains limited. Existing frameworks depend on rigid, pre-definedworkflows and inflexible coding strategies; consequently, they excel only onrelatively simple, classical problems and fail to capture the empiricalexpertise that human practitioners bring to complex, innovative tasks. In thiswork, we introduce AutoMind, an adaptive, knowledgeable LLM-agent frameworkthat overcomes these deficiencies through three key advances: (1) a curatedexpert knowledge base that grounds the agent in domain expert knowledge, (2) anagentic knowledgeable tree search algorithm that strategically explorespossible solutions, and (3) a self-adaptive coding strategy that dynamicallytailors code generation to task complexity. Evaluations on two automated datascience benchmarks demonstrate that AutoMind delivers superior performanceversus state-of-the-art baselines. Additional analyses confirm favorableeffectiveness, efficiency, and qualitative solution quality, highlightingAutoMind as an efficient and robust step toward fully automated data science.