FinDiff: Diffusion Models for Financial Tabular Data Generation

  • 2023-09-04 10:30:15
  • Timur Sattarov, Marco Schreyer, Damian Borth
  • 0

Abstract

The sharing of microdata, such as fund holdings and derivative instruments,by regulatory institutions presents a unique challenge due to strict dataconfidentiality and privacy regulations. These challenges often hinder theability of both academics and practitioners to conduct collaborative researcheffectively. The emergence of generative models, particularly diffusion models,capable of synthesizing data mimicking the underlying distributions ofreal-world data presents a compelling solution. This work introduces 'FinDiff',a diffusion model designed to generate real-world financial tabular data for avariety of regulatory downstream tasks, for example economic scenario modeling,stress tests, and fraud detection. The model uses embedding encodings to modelmixed modality financial data, comprising both categorical and numericattributes. The performance of FinDiff in generating synthetic tabularfinancial data is evaluated against state-of-the-art baseline models usingthree real-world financial datasets (including two publicly available datasetsand one proprietary dataset). Empirical results demonstrate that FinDiff excelsin generating synthetic tabular financial data with high fidelity, privacy, andutility.

 

Quick Read (beta)

loading the full paper ...