Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

George Em Karniadakis

Leveraging Operator Learning to Accelerate Convergence of the Preconditioned Conjugate Gradient Method

Jul 31, 2025

Alena Kopaničáková, Youngkyu Lee, George Em Karniadakis

Abstract:We propose a new deflation strategy to accelerate the convergence of the preconditioned conjugate gradient(PCG) method for solving parametric large-scale linear systems of equations. Unlike traditional deflation techniques that rely on eigenvector approximations or recycled Krylov subspaces, we generate the deflation subspaces using operator learning, specifically the Deep Operator Network~(DeepONet). To this aim, we introduce two complementary approaches for assembling the deflation operators. The first approach approximates near-null space vectors of the discrete PDE operator using the basis functions learned by the DeepONet. The second approach directly leverages solutions predicted by the DeepONet. To further enhance convergence, we also propose several strategies for prescribing the sparsity pattern of the deflation operator. A comprehensive set of numerical experiments encompassing steady-state, time-dependent, scalar, and vector-valued problems posed on both structured and unstructured geometries is presented and demonstrates the effectiveness of the proposed DeepONet-based deflated PCG method, as well as its generalization across a wide range of model parameters and problem resolutions.

* 31 pages

Via

Access Paper or Ask Questions

Representation Meets Optimization: Training PINNs and PIKANs for Gray-Box Discovery in Systems Pharmacology

Apr 10, 2025

Nazanin Ahmadi Daryakenari, Khemraj Shukla, George Em Karniadakis

Abstract:Physics-Informed Kolmogorov-Arnold Networks (PIKANs) are gaining attention as an effective counterpart to the original multilayer perceptron-based Physics-Informed Neural Networks (PINNs). Both representation models can address inverse problems and facilitate gray-box system identification. However, a comprehensive understanding of their performance in terms of accuracy and speed remains underexplored. In particular, we introduce a modified PIKAN architecture, tanh-cPIKAN, which is based on Chebyshev polynomials for parametrization of the univariate functions with an extra nonlinearity for enhanced performance. We then present a systematic investigation of how choices of the optimizer, representation, and training configuration influence the performance of PINNs and PIKANs in the context of systems pharmacology modeling. We benchmark a wide range of first-order, second-order, and hybrid optimizers, including various learning rate schedulers. We use the new Optax library to identify the most effective combinations for learning gray-boxes under ill-posed, non-unique, and data-sparse conditions. We examine the influence of model architecture (MLP vs. KAN), numerical precision (single vs. double), the need for warm-up phases for second-order methods, and sensitivity to the initial learning rate. We also assess the optimizer scalability for larger models and analyze the trade-offs introduced by JAX in terms of computational efficiency and numerical accuracy. Using two representative systems pharmacology case studies - a pharmacokinetics model and a chemotherapy drug-response model - we offer practical guidance on selecting optimizers and representation models/architectures for robust and efficient gray-box discovery. Our findings provide actionable insights for improving the training of physics-informed networks in biomedical applications and beyond.

Via

Access Paper or Ask Questions

Mitigating Spectral Bias in Neural Operators via High-Frequency Scaling for Physical Systems

Mar 17, 2025

Siavash Khodakarami, Vivek Oommen, Aniruddha Bora, George Em Karniadakis

Abstract:Neural operators have emerged as powerful surrogates for modeling complex physical problems. However, they suffer from spectral bias making them oblivious to high-frequency modes, which are present in multiscale physical systems. Therefore, they tend to produce over-smoothed solutions, which is particularly problematic in modeling turbulence and for systems with intricate patterns and sharp gradients such as multi-phase flow systems. In this work, we introduce a new approach named high-frequency scaling (HFS) to mitigate spectral bias in convolutional-based neural operators. By integrating HFS with proper variants of UNet neural operators, we demonstrate a higher prediction accuracy by mitigating spectral bias in single and two-phase flow problems. Unlike Fourier-based techniques, HFS is directly applied to the latent space, thus eliminating the computational cost associated with the Fourier transform. Additionally, we investigate alternative spectral bias mitigation through diffusion models conditioned on neural operators. While the diffusion model integrated with the standard neural operator may still suffer from significant errors, these errors are substantially reduced when the diffusion model is integrated with a HFS-enhanced neural operator.

Via

Access Paper or Ask Questions

Learning and discovering multiple solutions using physics-informed neural networks with random initialization and deep ensemble

Mar 08, 2025

Zongren Zou, Zhicheng Wang, George Em Karniadakis

Abstract:We explore the capability of physics-informed neural networks (PINNs) to discover multiple solutions. Many real-world phenomena governed by nonlinear differential equations (DEs), such as fluid flow, exhibit multiple solutions under the same conditions, yet capturing this solution multiplicity remains a significant challenge. A key difficulty is giving appropriate initial conditions or initial guesses, to which the widely used time-marching schemes and Newton's iteration method are very sensitive in finding solutions for complex computational problems. While machine learning models, particularly PINNs, have shown promise in solving DEs, their ability to capture multiple solutions remains underexplored. In this work, we propose a simple and practical approach using PINNs to learn and discover multiple solutions. We first reveal that PINNs, when combined with random initialization and deep ensemble method -- originally developed for uncertainty quantification -- can effectively uncover multiple solutions to nonlinear ordinary and partial differential equations (ODEs/PDEs). Our approach highlights the critical role of initialization in shaping solution diversity, addressing an often-overlooked aspect of machine learning for scientific computing. Furthermore, we propose utilizing PINN-generated solutions as initial conditions or initial guesses for conventional numerical solvers to enhance accuracy and efficiency in capturing multiple solutions. Extensive numerical experiments, including the Allen-Cahn equation and cavity flow, where our approach successfully identifies both stable and unstable solutions, validate the effectiveness of our method. These findings establish a general and efficient framework for addressing solution multiplicity in nonlinear differential equations.

Via

Access Paper or Ask Questions

DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks

Feb 25, 2025

Qile Jiang, Zhiwei Gao, George Em Karniadakis

Abstract:Large Language Models (LLMs) have emerged as powerful tools for tackling a wide range of problems, including those in scientific computing, particularly in solving partial differential equations (PDEs). However, different models exhibit distinct strengths and preferences, resulting in varying levels of performance. In this paper, we compare the capabilities of the most advanced LLMs--ChatGPT and DeepSeek--along with their reasoning-optimized versions in addressing computational challenges. Specifically, we evaluate their proficiency in solving traditional numerical problems in scientific computing as well as leveraging scientific machine learning techniques for PDE-based problems. We designed all our experiments so that a non-trivial decision is required, e.g. defining the proper space of input functions for neural operator learning. Our findings reveal that the latest model, ChatGPT o3-mini-high, usually delivers the most accurate results while also responding significantly faster than its reasoning counterpart, DeepSeek R1. This enhanced speed and accuracy make ChatGPT o3-mini-high a more practical and efficient choice for diverse computational tasks at this juncture.

Via

Access Paper or Ask Questions

Connecting the geometry and dynamics of many-body complex systems with message passing neural operators

Feb 21, 2025

Nicholas A. Gabriel, Neil F. Johnson, George Em Karniadakis

Abstract:The relationship between scale transformations and dynamics established by renormalization group techniques is a cornerstone of modern physical theories, from fluid mechanics to elementary particle physics. Integrating renormalization group methods into neural operators for many-body complex systems could provide a foundational inductive bias for learning their effective dynamics, while also uncovering multiscale organization. We introduce a scalable AI framework, ROMA (Renormalized Operators with Multiscale Attention), for learning multiscale evolution operators of many-body complex systems. In particular, we develop a renormalization procedure based on neural analogs of the geometric and laplacian renormalization groups, which can be co-learned with neural operators. An attention mechanism is used to model multiscale interactions by connecting geometric representations of local subgraphs and dynamical operators. We apply this framework in challenging conditions: large systems of more than 1M nodes, long-range interactions, and noisy input-output data for two contrasting examples: Kuramoto oscillators and Burgers-like social dynamics. We demonstrate that the ROMA framework improves scalability and positive transfer between forecasting and effective dynamics tasks compared to state-of-the-art operator learning techniques, while also giving insight into multiscale interactions. Additionally, we investigate power law scaling in the number of model parameters, and demonstrate a departure from typical power law exponents in the presence of hierarchical and multiscale interactions.

Via

Access Paper or Ask Questions

Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions

Jan 21, 2025

Qianying Cao, Shanqing Liu, Alan John Varghese, Jerome Darbon, Michael Triantafyllou, George Em Karniadakis

Abstract:Time series forecasting plays a pivotal role in a wide range of applications, including weather prediction, healthcare, structural health monitoring, predictive maintenance, energy systems, and financial markets. While models such as LSTM, GRU, Transformers, and State-Space Models (SSMs) have become standard tools in this domain, selecting the optimal architecture remains a challenge. Performance comparisons often depend on evaluation metrics and the datasets under analysis, making the choice of a universally optimal model controversial. In this work, we introduce a flexible automated framework for time series forecasting that systematically designs and evaluates diverse network architectures by integrating LSTM, GRU, multi-head Attention, and SSM blocks. Using a multi-objective optimization approach, our framework determines the number, sequence, and combination of blocks to align with specific requirements and evaluation objectives. From the resulting Pareto-optimal architectures, the best model for a given context is selected via a user-defined preference function. We validate our framework across four distinct real-world applications. Results show that a single-layer GRU or LSTM is usually optimal when minimizing training time alone. However, when maximizing accuracy or balancing multiple objectives, the best architectures are often composite designs incorporating multiple block types in specific configurations. By employing a weighted preference function, users can resolve trade-offs between objectives, revealing novel, context-specific optimal architectures. Our findings underscore that no single neural architecture is universally optimal for time series forecasting. Instead, the best-performing model emerges as a data-driven composite architecture tailored to user-defined criteria and evaluation objectives.

* 35 pages, 8 figures

Via

Access Paper or Ask Questions

Scalable Bayesian Physics-Informed Kolmogorov-Arnold Networks

Jan 15, 2025

Zhiwei Gao, George Em Karniadakis

Figure 1 for Scalable Bayesian Physics-Informed Kolmogorov-Arnold Networks

Figure 2 for Scalable Bayesian Physics-Informed Kolmogorov-Arnold Networks

Figure 3 for Scalable Bayesian Physics-Informed Kolmogorov-Arnold Networks

Figure 4 for Scalable Bayesian Physics-Informed Kolmogorov-Arnold Networks

Abstract:Uncertainty quantification (UQ) plays a pivotal role in scientific machine learning, especially when surrogate models are used to approximate complex systems. Although multilayer perceptions (MLPs) are commonly employed as surrogates, they often suffer from overfitting due to their large number of parameters. Kolmogorov-Arnold networks (KANs) offer an alternative solution with fewer parameters. However, gradient-based inference methods, such as Hamiltonian Monte Carlo (HMC), may result in computational inefficiency when applied to KANs, especially for large-scale datasets, due to the high cost of back-propagation.To address these challenges, we propose a novel approach, combining the dropout Tikhonov ensemble Kalman inversion (DTEKI) with Chebyshev KANs. This gradient-free method effectively mitigates overfitting and enhances numerical stability. Additionally, we incorporate the active subspace method to reduce the parameter-space dimensionality, allowing us to improve the accuracy of predictions and obtain more reliable uncertainty estimates.Extensive experiments demonstrate the efficacy of our approach in various test cases, including scenarios with large datasets and high noise levels. Our results show that the new method achieves comparable or better accuracy, much higher efficiency as well as stability compared to HMC, in addition to scalability. Moreover, by leveraging the low-dimensional parameter subspace, our method preserves prediction accuracy while substantially reducing further the computational cost.

Via

Access Paper or Ask Questions

DeepVIVONet: Using deep neural operators to optimize sensor locations with application to vortex-induced vibrations

Jan 07, 2025

Ruyin Wan, Ehsan Kharazmi, Michael S Triantafyllou, George Em Karniadakis

Figure 1 for DeepVIVONet: Using deep neural operators to optimize sensor locations with application to vortex-induced vibrations

Figure 2 for DeepVIVONet: Using deep neural operators to optimize sensor locations with application to vortex-induced vibrations

Figure 3 for DeepVIVONet: Using deep neural operators to optimize sensor locations with application to vortex-induced vibrations

Figure 4 for DeepVIVONet: Using deep neural operators to optimize sensor locations with application to vortex-induced vibrations

Abstract:We introduce DeepVIVONet, a new framework for optimal dynamic reconstruction and forecasting of the vortex-induced vibrations (VIV) of a marine riser, using field data. We demonstrate the effectiveness of DeepVIVONet in accurately reconstructing the motion of an off--shore marine riser by using sparse spatio-temporal measurements. We also show the generalization of our model in extrapolating to other flow conditions via transfer learning, underscoring its potential to streamline operational efficiency and enhance predictive accuracy. The trained DeepVIVONet serves as a fast and accurate surrogate model for the marine riser, which we use in an outer--loop optimization algorithm to obtain the optimal locations for placing the sensors. Furthermore, we employ an existing sensor placement method based on proper orthogonal decomposition (POD) to compare with our data-driven approach. We find that that while POD offers a good approach for initial sensor placement, DeepVIVONet's adaptive capabilities yield more precise and cost-effective configurations.

Via

Access Paper or Ask Questions

KKANs: Kurkova-Kolmogorov-Arnold Networks and Their Learning Dynamics

Dec 21, 2024

Juan Diego Toscano, Li-Lian Wang, George Em Karniadakis

Abstract:Inspired by the Kolmogorov-Arnold representation theorem and Kurkova's principle of using approximate representations, we propose the Kurkova-Kolmogorov-Arnold Network (KKAN), a new two-block architecture that combines robust multi-layer perceptron (MLP) based inner functions with flexible linear combinations of basis functions as outer functions. We first prove that KKAN is a universal approximator, and then we demonstrate its versatility across scientific machine-learning applications, including function regression, physics-informed machine learning (PIML), and operator-learning frameworks. The benchmark results show that KKANs outperform MLPs and the original Kolmogorov-Arnold Networks (KANs) in function approximation and operator learning tasks and achieve performance comparable to fully optimized MLPs for PIML. To better understand the behavior of the new representation models, we analyze their geometric complexity and learning dynamics using information bottleneck theory, identifying three universal learning stages, fitting, transition, and diffusion, across all types of architectures. We find a strong correlation between geometric complexity and signal-to-noise ratio (SNR), with optimal generalization achieved during the diffusion stage. Additionally, we propose self-scaled residual-based attention weights to maintain high SNR dynamically, ensuring uniform convergence and prolonged learning.

* Kolmogorov-Arnold representation theorem; physics-informed neural networks; Kolmogorov-Arnold networks; optimization algorithms; self-adaptive weights; information bottleneck theory

Via

Access Paper or Ask Questions