Abstract:Stochastic reduced-order models are widely used to represent the effective dynamics of complex systems, but estimating their drift and diffusion coefficients from data remains challenging. Standard approaches often rely on short-time trajectory increments, state-space partitioning, or repeated simulation of candidate models, which become unreliable or computationally expensive for high-dimensional systems, coarse temporal sampling, or unevenly sampled data. We introduce a data-driven calibration method based on a novel relationship between the coefficients of a stochastic reduced model and the conditional score of the finite-time transition density, defined as the gradient of the logarithm of the transition density with respect to the initial state. The resulting identity expresses derivatives of lagged correlation functions as stationary expectations over observed lagged pairs involving this conditional score and the unknown model coefficients. This formulation allows the drift and diffusion structure to be constrained directly from finite-lag statistics, without differentiating trajectories, partitioning state space, or repeatedly integrating candidate reduced models during calibration, yielding a least-squares fitting problem over stationary lagged pairs. We validate the approach on analytically tractable and data-driven nonequilibrium diffusions, demonstrating that the inferred models preserve the invariant statistics while accurately reproducing finite-lag dynamical correlations. The framework provides a scalable route for learning stochastic reduced-order models from data that reproduce prescribed statistical and dynamical properties.




Abstract:We present a novel data-driven framework for estimating the response of higher-order moments of nonlinear stochastic systems to small external perturbations. The classical Generalized Fluctuation-Dissipation Theorem (GFDT) links the unperturbed steady-state distribution to the system's linear response. Standard implementations rely on Gaussian approximations, which can often accurately predict the mean response but usually introduce significant biases in higher-order moments, such as variance, skewness, and kurtosis. To address this limitation, we combine GFDT with recent advances in score-based generative modeling, which enable direct estimation of the score function from data without requiring full density reconstruction. Our method is validated on three reduced-order stochastic models relevant to climate dynamics: a scalar stochastic model for low-frequency climate variability, a slow-fast triad model mimicking key features of the El Nino-Southern Oscillation (ENSO), and a six-dimensional stochastic barotropic model capturing atmospheric regime transitions. In all cases, the approach captures strongly nonlinear and non-Gaussian features of the system's response, outperforming traditional Gaussian approximations.