Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation

  • 2025-11-03 16:31:16
  • Cécile Rousseau, Tobia Boschi, Giandomenico Cornacchia, Dhaval Salwala, Alessandra Pascale, Juan Bernabe Moreno
  • 0

Abstract

SDForger is a flexible and efficient framework for generating high-qualitymultivariate time series using LLMs. Leveraging a compact data representation,SDForger provides synthetic time series generation from a few samples andlow-computation fine-tuning of any autoregressive LLM. Specifically, theframework transforms univariate and multivariate signals into tabularembeddings, which are then encoded into text and used to fine-tune the LLM. Atinference, new textual embeddings are sampled and decoded into synthetic timeseries that retain the original data's statistical properties and temporaldynamics. Across a diverse range of datasets, SDForger outperforms existinggenerative models in many scenarios, both in similarity-based evaluations anddownstream forecasting tasks. By enabling textual conditioning in thegeneration process, SDForger paves the way for multimodal modeling and thestreamlined integration of time series with textual information. The model isopen-sourced athttps://github.com/IBM/fms-dgt/tree/main/fms_dgt/public/databuilders/time_series.

 

Quick Read (beta)

loading the full paper ...