AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Abstract

Large Language Model (LLM) agents have shown great potential in addressingreal-world data science problems. LLM-driven data science agents promise toautomate the entire machine learning pipeline, yet their real-worldeffectiveness remains limited. Existing frameworks depend on rigid, pre-definedworkflows and inflexible coding strategies; consequently, they excel only onrelatively simple, classical problems and fail to capture the empiricalexpertise that human practitioners bring to complex, innovative tasks. In thiswork, we introduce AutoMind, an adaptive, knowledgeable LLM-agent frameworkthat overcomes these deficiencies through three key advances: (1) a curatedexpert knowledge base that grounds the agent in domain expert knowledge, (2) anagentic knowledgeable tree search algorithm that strategically explorespossible solutions, and (3) a self-adaptive coding strategy that dynamicallytailors code generation to task complexity. Evaluations on two automated datascience benchmarks demonstrate that AutoMind delivers superior performanceversus state-of-the-art baselines. Additional analyses confirm favorableeffectiveness, efficiency, and qualitative solution quality, highlightingAutoMind as an efficient and robust step toward fully automated data science.

Quick Read (beta)

loading the full paper ...