Abstract
Medical datasets are typically affected by issues such as missing values,class imbalance, a heterogeneous feature types, and a high number of featuresversus a relatively small number of samples, preventing machine learning modelsfrom obtaining proper results in classification and regression tasks. Thispaper introduces AutoML-Med, an Automated Machine Learning tool specificallydesigned to address these challenges, minimizing user intervention andidentifying the optimal combination of preprocessing techniques and predictivemodels. AutoML-Med's architecture incorporates Latin Hypercube Sampling (LHS)for exploring preprocessing methods, trains models using selected metrics, andutilizes Partial Rank Correlation Coefficient (PRCC) for fine-tunedoptimization of the most influential preprocessing steps. Experimental resultsdemonstrate AutoML-Med's effectiveness in two different clinical settings,achieving higher balanced accuracy and sensitivity, which are crucial foridentifying at-risk patients, compared to other state-of-the-art tools.AutoML-Med's ability to improve prediction results, especially in medicaldatasets with sparse data and class imbalance, highlights its potential tostreamline Machine Learning applications in healthcare.