Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

  • 2019-07-14 17:12:55
  • Feras A. Saad, Marco F. Cusumano-Towner, Ulrich Schaechtle, Martin C. Rinard, Vikash K. Mansinghka
  • 1

Abstract

We present new techniques for automatically constructing probabilisticprograms for data analysis, interpretation, and prediction. These techniqueswork with probabilistic domain-specific data modeling languages that capturekey properties of a broad class of data generating processes, using Bayesianinference to synthesize probabilistic programs in these modeling languagesgiven observed data. We provide a precise formulation of Bayesian synthesis forautomatic data modeling that identifies sufficient conditions for the resultingsynthesis procedure to be sound. We also derive a general class of synthesisalgorithms for domain-specific languages specified by probabilisticcontext-free grammars and establish the soundness of our approach for theselanguages. We apply the techniques to automatically synthesize probabilisticprograms for time series data and multivariate tabular data. We show how toanalyze the structure of the synthesized programs to compute, for keyqualitative properties of interest, the probability that the underlying datagenerating process exhibits each of these properties. Second, we translateprobabilistic programs in the domain-specific language into probabilisticprograms in Venture, a general-purpose probabilistic programming system. Thetranslated Venture programs are then executed to obtain predictions of new timeseries data and new multivariate data records. Experimental results show thatour techniques can accurately infer qualitative structure in multiplereal-world data sets and outperform standard data analysis methods inforecasting and predicting new data.

 

Quick Read (beta)

loading the full paper ...