AutoML-Med: A Framework for Automated Machine Learning in Medical Tabular Data

Abstract

Medical datasets are typically affected by issues such as missing values,class imbalance, a heterogeneous feature types, and a high number of featuresversus a relatively small number of samples, preventing machine learning modelsfrom obtaining proper results in classification and regression tasks. Thispaper introduces AutoML-Med, an Automated Machine Learning tool specificallydesigned to address these challenges, minimizing user intervention andidentifying the optimal combination of preprocessing techniques and predictivemodels. AutoML-Med's architecture incorporates Latin Hypercube Sampling (LHS)for exploring preprocessing methods, trains models using selected metrics, andutilizes Partial Rank Correlation Coefficient (PRCC) for fine-tunedoptimization of the most influential preprocessing steps. Experimental resultsdemonstrate AutoML-Med's effectiveness in two different clinical settings,achieving higher balanced accuracy and sensitivity, which are crucial foridentifying at-risk patients, compared to other state-of-the-art tools.AutoML-Med's ability to improve prediction results, especially in medicaldatasets with sparse data and class imbalance, highlights its potential tostreamline Machine Learning applications in healthcare.

Quick Read (beta)

loading the full paper ...