Abstract
Obtaining real-world network datasets is often challenging because ofprivacy, security, and computational constraints. In the absence of suchdatasets, graph generative models become essential tools for creating syntheticdatasets. In this paper, we introduce a novel machine learning model forgenerating high-fidelity synthetic network flow datasets that arerepresentative of real-world networks. Our approach involves the generation ofdynamic multigraphs using a stochastic Kronecker graph generator for structuregeneration and a tabular generative adversarial network for feature generation.We further employ an XGBoost (eXtreme Gradient Boosting) model for graphalignment, ensuring accurate overlay of features onto the generated graphstructure. We evaluate our model using new metrics that assess both theaccuracy and diversity of the synthetic graphs. Our results demonstrateimprovements in accuracy over previous large-scale graph generation methodswhile maintaining similar efficiency. We also explore the trade-off betweenaccuracy and diversity in synthetic graph dataset creation, a topic notextensively covered in related works. Our contributions include the synthesisand evaluation of large real-world netflow datasets and the definition of newmetrics for evaluating synthetic graph generative models.