In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.
Providing accurate uncertainty estimations is essential for producing reliable machine learning models, especially in safety-critical applications such as accelerator systems. Gaussian process models are generally regarded as the gold standard method for this task, but they can struggle with large, high-dimensional datasets. Combining deep neural networks with Gaussian process approximation techniques have shown promising results, but dimensionality reduction through standard deep neural network layers is not guaranteed to maintain the distance information necessary for Gaussian process models. We build on previous work by comparing the use of the singular value decomposition against a spectral-normalized dense layer as a feature extractor for a deep neural Gaussian process approximation model and apply it to a capacitance prediction problem for the High Voltage Converter Modulators in the Oak Ridge Spallation Neutron Source. Our model shows improved distance preservation and predicts in-distribution capacitance values with less than 1% error.
We present a multi-module framework based on Conditional Variational Autoencoder (CVAE) to detect anomalies in the power signals coming from multiple High Voltage Converter Modulators (HVCMs). We condition the model with the specific modulator type to capture different representations of the normal waveforms and to improve the sensitivity of the model to identify a specific type of fault when we have limited samples for a given module type. We studied several neural network (NN) architectures for our CVAE model and evaluated the model performance by looking at their loss landscape for stability and generalization. Our results for the Spallation Neutron Source (SNS) experimental data show that the trained model generalizes well to detecting multiple fault types for several HVCM module types. The results of this study can be used to improve the HVCM reliability and overall SNS uptime
We propose a novel prediction interval method to learn prediction mean values, lower and upper bounds of prediction intervals from three independently trained neural networks only using the standard mean squared error (MSE) loss, for uncertainty quantification in regression tasks. Our method requires no distributional assumption on data, does not introduce unusual hyperparameters to either the neural network models or the loss function. Moreover, our method can effectively identify out-of-distribution samples and reasonably quantify their uncertainty. Numerical experiments on benchmark regression problems show that our method outperforms the state-of-the-art methods with respect to predictive uncertainty quality, robustness, and identification of out-of-distribution samples.
We developed a new scalable evolution strategy with directional Gaussian smoothing (DGS-ES) for high-dimensional blackbox optimization. Standard ES methods have been proved to suffer from the curse of dimensionality, due to the random directional search and low accuracy of Monte Carlo estimation. The key idea of this work is to develop Gaussian smoothing approach which only averages the original objective function along $d$ orthogonal directions. In this way, the partial derivatives of the smoothed function along those directions can be represented by one-dimensional integrals, instead of $d$-dimensional integrals in the standard ES methods. As such, the averaged partial derivatives can be approximated using the Gauss-Hermite quadrature rule, as opposed to MC, which significantly improves the accuracy of the averaged gradients. Moreover, the smoothing technique reduces the barrier of local minima, such that global minima become easier to achieve. We provide three sets of examples to demonstrate the performance of our method, including benchmark functions for global optimization, and a rocket shell design problem.
Improving predictive understanding of Earth system variability and change requires data-model integration. Efficient data-model integration for complex models requires surrogate modeling to reduce model evaluation time. However, building a surrogate of a large-scale Earth system model (ESM) with many output variables is computationally intensive because it involves a large number of expensive ESM simulations. In this effort, we propose an efficient surrogate method capable of using a few ESM runs to build an accurate and fast-to-evaluate surrogate system of model outputs over large spatial and temporal domains. We first use singular value decomposition to reduce the output dimensions, and then use Bayesian optimization techniques to generate an accurate neural network surrogate model based on limited ESM simulation samples. Our machine learning based surrogate methods can build and evaluate a large surrogate system of many variables quickly. Thus, whenever the quantities of interest change such as a different objective function, a new site, and a longer simulation time, we can simply extract the information of interest from the surrogate system without rebuilding new surrogates, which significantly saves computational efforts. We apply the proposed method to a regional ecosystem model to approximate the relationship between 8 model parameters and 42660 carbon flux outputs. Results indicate that using only 20 model simulations, we can build an accurate surrogate system of the 42660 variables, where the consistency between the surrogate prediction and actual model simulation is 0.93 and the mean squared error is 0.02. This highly-accurate and fast-to-evaluate surrogate system will greatly enhance the computational efficiency in data-model integration to improve predictions and advance our understanding of the Earth system.