Abstract:The proliferation of data-driven models in weather and climate sciences has marked a significant paradigm shift, with advanced models demonstrating exceptional skill in medium-range forecasting. However, these models are often limited by long-term instabilities, climatological drift, and substantial computational costs during training and inference, restricting their broader application for climate studies. Addressing these limitations, Guan et al. (2024) introduced LUCIE, a lightweight, physically consistent climate emulator utilizing a Spherical Fourier Neural Operator (SFNO) architecture. This model is able to reproduce accurate long-term statistics including climatological mean and seasonal variability. However, LUCIE's native resolution (~300 km) is inadequate for detailed regional impact assessments. To overcome this limitation, we introduce a deep learning-based downscaling framework, leveraging probabilistic diffusion-based generative models with conditional and posterior sampling frameworks. These models downscale coarse LUCIE outputs to 25 km resolution. They are trained on approximately 14,000 ERA5 timesteps spanning 2000-2009 and evaluated on LUCIE predictions from 2010 to 2020. Model performance is assessed through diverse metrics, including latitude-averaged RMSE, power spectrum, probability density functions and First Empirical Orthogonal Function of the zonal wind. We observe that the proposed approach is able to preserve the coarse-grained dynamics from LUCIE while generating fine-scaled climatological statistics at ~28km resolution.
Abstract:Data-driven weather models have recently achieved state-of-the-art performance, yet progress has plateaued in recent years. This paper introduces a Mixture of Experts (MoWE) approach as a novel paradigm to overcome these limitations, not by creating a new forecaster, but by optimally combining the outputs of existing models. The MoWE model is trained with significantly lower computational resources than the individual experts. Our model employs a Vision Transformer-based gating network that dynamically learns to weight the contributions of multiple "expert" models at each grid point, conditioned on forecast lead time. This approach creates a synthesized deterministic forecast that is more accurate than any individual component in terms of Root Mean Squared Error (RMSE). Our results demonstrate the effectiveness of this method, achieving up to a 10% lower RMSE than the best-performing AI weather model on a 2-day forecast horizon, significantly outperforming individual experts as well as a simple average across experts. This work presents a computationally efficient and scalable strategy to push the state of the art in data-driven weather prediction by making the most out of leading high-quality forecast models.
Abstract:Predicting the long-term behavior of chaotic systems remains a formidable challenge due to their extreme sensitivity to initial conditions and the inherent limitations of traditional data-driven modeling approaches. This paper introduces a novel framework that addresses these challenges by leveraging the recently proposed multi-step penalty (MP) optimization technique. Our approach extends the applicability of MP optimization to a wide range of deep learning architectures, including Fourier Neural Operators and UNETs. By introducing penalized local discontinuities in the forecast trajectory, we effectively handle the non-convexity of loss landscapes commonly encountered in training neural networks for chaotic systems. We demonstrate the effectiveness of our method through its application to two challenging use-cases: the prediction of flow velocity evolution in two-dimensional turbulence and ocean dynamics using reanalysis data. Our results highlight the potential of this approach for accurate and stable long-term prediction of chaotic dynamics, paving the way for new advancements in data-driven modeling of complex natural phenomena.
Abstract:Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as the geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are ineffective for learning chaotic dynamical systems. In this work, we propose a novel NODE-training approach that allows for robust learning of chaotic dynamical systems. Our method addresses the challenges of non-convexity and exploding gradients associated with underlying chaotic dynamics. Training data trajectories from such systems are split into multiple, non-overlapping time windows. In addition to the deviation from the training data, the optimization loss term further penalizes the discontinuities of the predicted trajectory between the time windows. The window size is selected based on the fastest Lyapunov time scale of the system. Multi-step penalty(MP) method is first demonstrated on Lorenz equation, to illustrate how it improves the loss landscape and thereby accelerating the optimization convergence. MP method can optimize chaotic systems in a manner similar to least-squares shadowing with significantly lower computational costs. Our proposed algorithm, denoted the Multistep Penalty NODE(MP-NODE), is applied to chaotic systems such as the Kuramoto-Sivashinsky equation and the two-dimensional Kolmogorov flow. It is observed that MP-NODE provide viable performance for such chaotic systems, not only for short-term trajectory predictions but also for invariant statistics that are hallmarks of the chaotic nature of these dynamics.
Abstract:In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We also demonstrate that this error is significantly affected by the time step size and the Jacobian which play a role in amplifying the initial one-step error made by using the closure. Our analysis also shows that the error propagates exponentially with rollout time and the upper bound of the system Jacobian which is itself influenced by the Jacobian of the closure formulation. These findings could enable the development of new regularization techniques for ML models based on the identified error-bound terms, improving their robustness and reducing error propagation.