Abstract
Deep learning architectures for supervised learning on tabular data rangefrom simple multilayer perceptrons (MLP) to sophisticated Transformers andretrieval-augmented methods. This study highlights a major, yet so faroverlooked opportunity for substantially improving tabular MLPs: namely,parameter-efficient ensembling -- a paradigm for implementing an ensemble ofmodels as one model producing multiple predictions. We start by developing TabM-- a simple model based on MLP and our variations of BatchEnsemble (an existingtechnique). Then, we perform a large-scale evaluation of tabular DLarchitectures on public benchmarks in terms of both task performance andefficiency, which renders the landscape of tabular DL in a new light.Generally, we show that MLPs, including TabM, form a line of stronger and morepractical models compared to attention- and retrieval-based architectures. Inparticular, we find that TabM demonstrates the best performance among tabularDL models. Lastly, we conduct an empirical analysis on the ensemble-like natureof TabM. For example, we observe that the multiple predictions of TabM are weakindividually, but powerful collectively. Overall, our work brings an impactfultechnique to tabular DL, analyses its behaviour, and advances theperformance-efficiency trade-off with TabM -- a simple and powerful baselinefor researchers and practitioners.