Abstract
The paper proposes a systematic framework for building data-driven stochasticdifferential equation (SDE) models from sparse, noisy observations. Unliketraditional parametric approaches, which assume a known functional form for thedrift, our goal here is to learn the entire drift function directly from datawithout strong structural assumptions, making it especially relevant inscientific disciplines where system dynamics are partially understood or highlycomplex. We cast the estimation problem as minimization of the penalizednegative log-likelihood functional over a reproducing kernel Hilbert space(RKHS). In the sparse observation regime, the presence of unobserved trajectorysegments makes the SDE likelihood intractable. To address this, we develop anExpectation-Maximization (EM) algorithm that employs a novel Sequential MonteCarlo (SMC) method to approximate the filtering distribution and generate MonteCarlo estimates of the E-step objective. The M-step then reduces to a penalizedempirical risk minimization problem in the RKHS, whose minimizer is given by afinite linear combination of kernel functions via a generalized representertheorem. To control model complexity across EM iterations, we also develop ahybrid Bayesian variant of the algorithm that uses shrinkage priors to identifysignificant coefficients in the kernel expansion. We establish importanttheoretical convergence results for both the exact and approximate EMsequences. The resulting EM-SMC-RKHS procedure enables accurate estimation ofthe drift function of stochastic dynamical systems in low-data regimes and isbroadly applicable across domains requiring continuous-time modeling underobservational constraints. We demonstrate the effectiveness of our methodthrough a series of numerical experiments.