Abstract
Heterogeneous tabular data are the most commonly used form of data and areessential for numerous critical and computationally demanding applications. Onhomogeneous data sets, deep neural networks have repeatedly shown excellentperformance and have therefore been widely adopted. However, their adaptationto tabular data for inference or data generation tasks remains challenging. Tofacilitate further progress in the field, this work provides an overview ofstate-of-the-art deep learning methods for tabular data. We categorize thesemethods into three groups: data transformations, specialized architectures, andregularization models. For each of these groups, our work offers acomprehensive overview of the main approaches. Moreover, we discuss deeplearning approaches for generating tabular data, and we also provide anoverview over strategies for explaining deep models on tabular data. Thus, ourfirst contribution is to address the main research streams and existingmethodologies in the mentioned areas, while highlighting relevant challengesand open research questions. Our second contribution is to provide an empiricalcomparison of traditional machine learning methods with eleven deep learningapproaches across five popular real-world tabular data sets of different sizesand with different learning objectives. Our results, which we have madepublicly available as competitive benchmarks, indicate that algorithms based ongradient-boosted tree ensembles still mostly outperform deep learning models onsupervised learning tasks, suggesting that the research progress on competitivedeep learning models for tabular data is stagnating. To the best of ourknowledge, this is the first in-depth overview of deep learning approaches fortabular data; as such, this work can serve as a valuable starting point toguide researchers and practitioners interested in deep learning with tabulardata.