Abstract:Inverse problems have many applications in science and engineering. In Computer vision, several image restoration tasks such as inpainting, deblurring, and super-resolution can be formally modeled as inverse problems. Recently, methods have been developed for solving inverse problems that only leverage a pre-trained unconditional diffusion model and do not require additional task-specific training. In such methods, however, the inherent intractability of determining the conditional score function during the reverse diffusion process poses a real challenge, leaving the methods to settle with an approximation instead, which affects their performance in practice. Here, we propose a MAP estimation framework to model the reverse conditional generation process of a continuous time diffusion model as an optimization process of the underlying MAP objective, whose gradient term is tractable. In theory, the proposed framework can be applied to solve general inverse problems using gradient-based optimization methods. However, given the highly non-convex nature of the loss objective, finding a perfect gradient-based optimization algorithm can be quite challenging, nevertheless, our framework offers several potential research directions. We use our proposed formulation and develop empirically effective algorithms for solving noiseless and noisy image inpainting tasks. We validate our proposed algorithms with extensive experiments across diverse mask settings.
Abstract:Flow control is key to maximize energy efficiency in a wide range of applications. However, traditional flow-control methods face significant challenges in addressing non-linear systems and high-dimensional data, limiting their application in realistic energy systems. This study advances deep-reinforcement-learning (DRL) methods for flow control, particularly focusing on integrating group-invariant networks and positional encoding into DRL architectures. Our methods leverage multi-agent reinforcement learning (MARL) to exploit policy invariance in space, in combination with group-invariant networks to ensure local symmetry invariance. Additionally, a positional encoding inspired by the transformer architecture is incorporated to provide location information to the agents, mitigating action constraints from strict invariance. The proposed methods are verified using a case study of Rayleigh-B\'enard convection, where the goal is to minimize the Nusselt number Nu. The group-invariant neural networks (GI-NNs) show faster convergence compared to the base MARL, achieving better average policy performance. The GI-NNs not only cut DRL training time in half but also notably enhance learning reproducibility. Positional encoding further enhances these results, effectively reducing the minimum Nu and stabilizing convergence. Interestingly, group invariant networks specialize in improving learning speed and positional encoding specializes in improving learning quality. These results demonstrate that choosing a suitable feature-representation method according to the purpose as well as the characteristics of each control problem is essential. We believe that the results of this study will not only inspire novel DRL methods with invariant and unique representations, but also provide useful insights for industrial applications.
Abstract:Are score function estimators an underestimated approach to learning with $k$-subset sampling? Sampling $k$-subsets is a fundamental operation in many machine learning tasks that is not amenable to differentiable parametrization, impeding gradient-based optimization. Prior work has focused on relaxed sampling or pathwise gradient estimators. Inspired by the success of score function estimators in variational inference and reinforcement learning, we revisit them within the context of $k$-subset sampling. Specifically, we demonstrate how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform, and reduce the estimator's variance with control variates. The resulting estimator provides both exact samples and unbiased gradient estimates while also applying to non-differentiable downstream models, unlike existing methods. Experiments in feature selection show results competitive with current methods, despite weaker assumptions.
Abstract:Artificial Intelligence (AI) has the potential to revolutionize space exploration by delegating several spacecraft decisions to an onboard AI instead of relying on ground control and predefined procedures. It is likely that there will be an AI/ML Processing Unit onboard the spacecraft running an inference engine. The neural-network will have pre-installed parameters that can be updated onboard by uploading, by telecommands, parameters obtained by training on the ground. However, satellite uplinks have limited bandwidth and transmissions can be costly. Furthermore, a mission operating with a suboptimal neural network will miss out on valuable scientific data. Smaller networks can thereby decrease the uplink cost, while increasing the value of the scientific data that is downloaded. In this work, we evaluate and discuss the use of reduced-precision and bare-minimum neural networks to reduce the time for upload. As an example of an AI use case, we focus on the NASA's Magnetosperic MultiScale (MMS) mission. We show how an AI onboard could be used in the Earth's magnetosphere to classify data to selectively downlink higher value data or to recognize a region-of-interest to trigger a burst-mode, collecting data at a high-rate. Using a simple filtering scheme and algorithm, we show how the start and end of a region-of-interest can be detected in on a stream of classifications. To provide the classifications, we use an established Convolutional Neural Network (CNN) trained to an accuracy >94%. We also show how the network can be reduced to a single linear layer and trained to the same accuracy as the established CNN. Thereby, reducing the overall size of the model by up to 98.9%. We further show how each network can be reduced by up to 75% of its original size, by using lower-precision formats to represent the network parameters, with a change in accuracy of less than 0.6 percentage points.
Abstract:This study aims to overcome the conventional deep-learning approaches based on convolutional neural networks, whose applicability to complex geometries and unstructured meshes is limited due to their inherent mesh dependency. We propose novel approaches to improve mesh-agnostic spatio-temporal prediction of transient flow fields using graph U-Nets, enabling accurate prediction on diverse mesh configurations. Key enhancements to the graph U-Net architecture, including the Gaussian mixture model convolutional operator and noise injection approaches, provide increased flexibility in modeling node dynamics: the former reduces prediction error by 95\% compared to conventional convolutional operators, while the latter improves long-term prediction robustness, resulting in an error reduction of 86\%. We also investigate transductive and inductive-learning perspectives of graph U-Nets with proposed improvements. In the transductive setting, they effectively predict quantities for unseen nodes within the trained graph. In the inductive setting, they successfully perform in mesh scenarios with different vortex-shedding periods, showing 98\% improvement in predicting the future flow fields compared to a model trained without the inductive settings. It is found that graph U-Nets without pooling operations, i.e. without reducing and restoring the node dimensionality of the graph data, perform better in inductive settings due to their ability to learn from the detailed structure of each graph. Meanwhile, we also discover that the choice of normalization technique significantly impacts graph U-Net performance.
Abstract:Technological advancements have substantially increased computational power and data availability, enabling the application of powerful machine-learning (ML) techniques across various fields. However, our ability to leverage ML methods for scientific discovery, {\it i.e.} to obtain fundamental and formalized knowledge about natural processes, is still in its infancy. In this review, we explore how the scientific community can increasingly leverage ML techniques to achieve scientific discoveries. We observe that the applicability and opportunity of ML depends strongly on the nature of the problem domain, and whether we have full ({\it e.g.}, turbulence), partial ({\it e.g.}, computational biochemistry), or no ({\it e.g.}, neuroscience) {\it a-priori} knowledge about the governing equations and physical properties of the system. Although challenges remain, principled use of ML is opening up new avenues for fundamental scientific discoveries. Throughout these diverse fields, there is a theme that ML is enabling researchers to embrace complexity in observational data that was previously intractable to classic analysis and numerical investigations.
Abstract:Neural-network models have been employed to predict the instantaneous flow close to the wall in a viscoelastic turbulent channel flow. The numerical simulation data at the wall is utilized to predict the instantaneous velocity fluctuations and polymeric-stress fluctuations at three different wall-normal positions. Apart from predicting the velocity fluctuations well in a hibernating flow, the neural-network models are also shown to predict the polymeric shear stress and the trace of the polymeric stresses at a given wall-normal location with reasonably good accuracy. These non-intrusive sensing models can be integrated in an experimental setting to construct the polymeric-stress field in turbulent flows, which otherwise may not be directly quantifiable in experimental measurements.
Abstract:Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.
Abstract:To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention. Due to the fact that self attention only makes usage of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through implementing singular-value decomposition (SVD) on the softmax attention score, we further observe that the self attention compresses contribution from both queries and keys in the spanned space of the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than the self attention or the widely-used long short-term memory (LSTM) network. Our results show great potential for applications in more complex high-dimensional dynamical systems.
Abstract:Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.