AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

Abstract

We introduce AutoGluon-Tabular, an open-source AutoML framework that requiresonly a single line of Python to train highly accurate machine learning modelson an unprocessed tabular dataset such as a CSV file. Unlike existing AutoMLframeworks that primarily focus on model/hyperparameter selection,AutoGluon-Tabular succeeds by ensembling multiple models and stacking them inmultiple layers. Experiments reveal that our multi-layer combination of manymodels offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercialAutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, andGoogle AutoML Tables. Tests on a suite of 50 classification and regressiontasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon isfaster, more robust, and much more accurate. We find that AutoGluon often evenoutperforms the best-in-hindsight combination of all of its competitors. In twopopular Kaggle competitions, AutoGluon beat 99% of the participating datascientists after merely 4h of training on the raw data.

Quick Read (beta)

loading the full paper ...