TabDDPM: Modelling Tabular Data with Diffusion Models

  • 2022-09-30 13:26:14
  • Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, Artem Babenko
  • 63

Abstract

Denoising diffusion probabilistic models are currently becoming the leadingparadigm of generative modeling for many important data modalities. Being themost prevalent in the computer vision community, diffusion models have alsorecently gained some attention in other domains, including speech, NLP, andgraph-like data. In this work, we investigate if the framework of diffusionmodels can be advantageous for general tabular problems, where datapoints aretypically represented by vectors of heterogeneous features. The inherentheterogeneity of tabular data makes it quite challenging for accurate modeling,since the individual features can be of completely different nature, i.e., someof them can be continuous and some of them can be discrete. To address suchdata types, we introduce TabDDPM -- a diffusion model that can be universallyapplied to any tabular dataset and handles any type of feature. We extensivelyevaluate TabDDPM on a wide set of benchmarks and demonstrate its superiorityover existing GAN/VAE alternatives, which is consistent with the advantage ofdiffusion models in other fields. Additionally, we show that TabDDPM iseligible for privacy-oriented setups, where the original datapoints cannot bepublicly shared.

 

Quick Read (beta)

loading the full paper ...