PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

  • 2025-07-28 16:22:50
  • Tianhao Wang, Simon Klancher, Kunal Mukherjee, Josh Wiedemeier, Feng Chen, Murat Kantarcioglu, Kangkook Jee
  • 0

Abstract

The rise of graph-structured data has driven interest in graph learning andsynthetic data generation. While successful in text and image domains,synthetic graph generation remains challenging -- especially for real-worldgraphs with complex, heterogeneous schemas. Existing research has focusedmostly on homogeneous structures with simple attributes, limiting theirusefulness and relevance for application domains requiring semantic fidelity. In this research, we introduce ProvCreator, a synthetic graph frameworkdesigned for complex heterogeneous graphs with high-dimensional node and edgeattributes. ProvCreator formulates graph synthesis as a sequence generationtask, enabling the use of transformer-based large language models. It featuresa versatile graph-to-sequence encoder-decoder that 1. losslessly encodes graphstructure and attributes, 2. efficiently compresses large graphs for contextualmodeling, and 3. supports end-to-end, learnable graph generation. To validate our research, we evaluate ProvCreator on two challenging domains:system provenance graphs in cybersecurity and knowledge graphs fromIntelliGraph Benchmark Dataset. In both cases, ProvCreator captures intricatedependencies between structure and semantics, enabling the generation ofrealistic and privacy-aware synthetic datasets.

 

Quick Read (beta)

loading the full paper ...