Abstract:We present a novel data-driven framework for estimating the response of higher-order moments of nonlinear stochastic systems to small external perturbations. The classical Generalized Fluctuation-Dissipation Theorem (GFDT) links the unperturbed steady-state distribution to the system's linear response. Standard implementations rely on Gaussian approximations, which can often accurately predict the mean response but usually introduce significant biases in higher-order moments, such as variance, skewness, and kurtosis. To address this limitation, we combine GFDT with recent advances in score-based generative modeling, which enable direct estimation of the score function from data without requiring full density reconstruction. Our method is validated on three reduced-order stochastic models relevant to climate dynamics: a scalar stochastic model for low-frequency climate variability, a slow-fast triad model mimicking key features of the El Nino-Southern Oscillation (ENSO), and a six-dimensional stochastic barotropic model capturing atmospheric regime transitions. In all cases, the approach captures strongly nonlinear and non-Gaussian features of the system's response, outperforming traditional Gaussian approximations.
Abstract:One of the greatest sources of uncertainty in future climate projections comes from limitations in modelling clouds and in understanding how different cloud types interact with the climate system. A key first step in reducing this uncertainty is to accurately classify cloud types at high spatial and temporal resolution. In this paper, we introduce Cumulo, a benchmark dataset for training and evaluating global cloud classification models. It consists of one year of 1km resolution MODIS hyperspectral imagery merged with pixel-width 'tracks' of CloudSat cloud labels. Bringing these complementary datasets together is a crucial first step, enabling the Machine-Learning community to develop innovative new techniques which could greatly benefit the Climate community. To showcase Cumulo, we provide baseline performance analysis using an invertible flow generative model (IResNet), which further allows us to discover new sub-classes for a given cloud class by exploring the latent space. To compare methods, we introduce a set of evaluation criteria, to identify models that are not only accurate, but also physically-realistic.