Complex design problems are common in the scientific and industrial fields. In practice, objective functions or constraints of these problems often do not have explicit formulas, and can be estimated only at a set of sampling points through experiments or simulations. Such optimization problems are especially challenging when design parameters are high-dimensional due to the curse of dimensionality. In this work, we propose a data-informed deep optimization (DiDo) approach as follows: first, we use a deep neural network (DNN) classifier to learn the feasible region; second, we sample feasible points based on the DNN classifier for fitting of the objective function; finally, we find optimal points of the DNN-surrogate optimization problem by gradient descent. To demonstrate the effectiveness of our DiDo approach, we consider a practical design case in industry, in which our approach yields good solutions using limited size of training data. We further use a 100-dimension toy example to show the effectiveness of our model for higher dimensional problems. Our results indicate that the DiDo approach empowered by DNN is flexible and promising for solving general high-dimensional design problems in practice.
In this paper, we propose a model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. In this work, we use a deep neural network to parameterize the Green's function. The empirical risk consists of the mean square of the governing equation, boundary conditions, and a few labels, which are numerically computed by traditional schemes on coarse grid points with cheap computation cost. With only the labeled dataset or only the model constraints, it is insufficient to accurately train a MOD-Net for complicate problems. Intuitively, the labeled dataset works as a regularization in addition to the model constraints. The MOD-Net is much efficient than original neural operator because the MOD-Net also uses the information of governing equation and the boundary conditions of the PDE rather than purely the expensive labels. Since the MOD-Net learns the Green's function of a PDE, it solves a type of PDEs but not a specific case. We numerically show MOD-Net is very efficient in solving Poisson equation and one-dimensional Boltzmann equation. For non-linear PDEs, where the concept of the Green's function does not apply, the non-linear MOD-Net can be similarly used as an ansatz for solving non-linear PDEs.
Deep neural network (DNN) usually learns the target function from low to high frequency, which is called frequency principle or spectral bias. This frequency principle sheds light on a high-frequency curse of DNNs -- difficult to learn high-frequency information. Inspired by the frequency principle, a series of works are devoted to develop algorithms for overcoming the high-frequency curse. A natural question arises: what is the upper limit of the decaying rate w.r.t. frequency when one trains a DNN? In this work, our theory, confirmed by numerical experiments, suggests that there is a critical decaying rate w.r.t. frequency in DNN training. Below the upper limit of the decaying rate, the DNN interpolates the training data by a function with a certain regularity. However, above the upper limit, the DNN interpolates the training data by a trivial function, i.e., a function is only non-zero at training data points. Our results indicate a better way to overcome the high-frequency curse is to design a proper pre-condition approach to shift high-frequency information to low-frequency one, which coincides with several previous developed algorithms for fast learning high-frequency information. More importantly, this work rigorously proves that the high-frequency curse is an intrinsic difficulty of DNNs.
Understanding the structure of loss landscape of deep neural networks (DNNs)is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/hyperplane of the target DNN with higher degeneracy and preserving the DNN output function. The embedding structure of critical points is independent of loss function and training data, showing a stark difference from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides an explanation for the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, our work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near
It is important to study what implicit regularization is imposed on the loss function during the training that leads over-parameterized neural networks (NNs) to good performance on real dataset. Empirically, existing works have shown that weights of NNs condense on isolated orientations with small initialization. The condensation implies that the NN learns features from the training data and is effectively a much smaller network. In this work, we show that the singularity of the activation function at original point is a key factor to understanding the condensation at initial training stage. Our experiments suggest that the maximal number of condensed orientations is twice of the singularity order. Our theoretical analysis confirms experiments for two cases, one is for the first-order singularity activation function and the other is for the one-dimensional input. This work takes a step towards understanding how small initialization implicitly leads NNs to condensation at initial training, which is crucial to understand the training and the learning of deep NNs.
Why heavily parameterized neural networks (NNs) do not overfit the data is an important long standing open question. We propose a phenomenological model of the NN training to explain this non-overfitting puzzle. Our linear frequency principle (LFP) model accounts for a key dynamical feature of NNs: they learn low frequencies first, irrespective of microscopic details. Theory based on our LFP model shows that low frequency dominance of target functions is the key condition for the non-overfitting of NNs and is verified by experiments. Furthermore, through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to a LFP model with quantitative prediction power.
A supervised learning problem is to find a function in a hypothesis function space given values on isolated data points. Inspired by the frequency principle in neural networks, we propose a Fourier-domain variational formulation for supervised learning problem. This formulation circumvents the difficulty of imposing the constraints of given values on isolated data points in continuum modelling. Under a necessary and sufficient condition within our unified framework, we establish the well-posedness of the Fourier-domain variational problem, by showing a critical exponent depending on the data dimension. In practice, a neural network can be a convenient way to implement our formulation, which automatically satisfies the well-posedness condition.
Developing efficient and accurate algorithms for chemistry integration is a challenging task due to its strong stiffness and high dimensionality. The current work presents a deep learning-based numerical method called DeepCombustion0.0 to solve stiff ordinary differential equation systems. The homogeneous autoignition of DME/air mixture, including 54 species, is adopted as an example to illustrate the validity and accuracy of the algorithm. The training and testing datasets cover a wide range of temperature, pressure, and mixture conditions between 750-1200 K, 30-50 atm, and equivalence ratio = 0.7-1.5. Both the first-stage low-temperature ignition (LTI) and the second-stage high-temperature ignition (HTI) are considered. The methodology highlights the importance of the adaptive data sampling techniques, power transform preprocessing, and binary deep neural network (DNN) design. By using the adaptive random samplings and appropriate power transforms, smooth submanifolds in the state vector phase space are observed, on which two three-layer DNNs can be appropriately trained. The neural networks are end-to-end, which predict temporal gradients of the state vectors directly. The results show that temporal evolutions predicted by DNN agree well with traditional numerical methods in all state vector dimensions, including temperature, pressure, and species concentrations. Besides, the ignition delay time differences are within 1%. At the same time, the CPU time is reduced by more than 20 times and 200 times compared with the HMTS and VODE method, respectively. The current work demonstrates the enormous potential of applying the deep learning algorithm in chemical kinetics and combustion modeling.