The task of using machine learning to approximate the mapping $\mathbf{x}\mapsto\sum_{i=1}^d x_i^2$ with $x_i\in[-1,1]$ seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to find the sparse network embedded in the dense network is similar to finding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of finding the needle is directly related to the Barron norm of the function. While only a small number of samples is needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. In order to control the size of the generalization gap, we find that the use of explicit regularization becomes increasingly more important as $d$ increases. The numerically observed sample complexity with explicit regularization scales as $\mathcal{O}(d^{2.5})$, which is in fact better than the theoretically predicted sample complexity that scales as $\mathcal{O}(d^{4})$. Without explicit regularization (also called implicit regularization), the numerically observed sample complexity is significantly higher and is close to $\mathcal{O}(d^{4.5})$.
The quantum approximate optimization algorithm (QAOA), as a hybrid quantum/classical algorithm, has received much interest recently. QAOA can also be viewed as a variational ansatz for quantum control. However, its direct application to emergent quantum technology encounters additional physical constraints: (i) the states of the quantum system are not observable; (ii) obtaining the derivatives of the objective function can be computationally expensive or even inaccessible in experiments, and (iii) the values of the objective function may be sensitive to various sources of uncertainty, as is the case for noisy intermediate-scale quantum (NISQ) devices. Taking such constraints into account, we show that policy-gradient-based reinforcement learning (RL) algorithms are well suited for optimizing the variational parameters of QAOA in a noise-robust fashion, opening up the way for developing RL techniques for continuous quantum control. This is advantageous to help mitigate and monitor the potentially unknown sources of errors in modern quantum simulators. We analyze the performance of the algorithm for quantum state transfer problems in single- and multi-qubit systems, subject to various sources of noise such as error terms in the Hamiltonian, or quantum uncertainty in the measurement process. We show that, in noisy setups, it is capable of outperforming state-of-the-art existing optimization algorithms.
The architectures of deep neural networks (DNN) rely heavily on the underlying grid structure of variables, for instance, the lattice of pixels in an image. For general high dimensional data with variables not associated with a grid, the multi-layer perceptron and deep brief network are often used. However, it is frequently observed that those networks do not perform competitively and they are not helpful for identifying important variables. In this paper, we propose a framework that imposes on blocks of variables a chain structure obtained by step-wise greedy search so that the DNN architecture can leverage the constructed grid. We call this new neural network Deep Variable-Block Chain (DVC). Because the variable blocks are used for classification in a sequential manner, we further develop the capacity of selecting variables adaptively according to a number of regions trained by a decision tree. Our experiments show that DVC outperforms other generic DNNs and other strong classifiers. Moreover, DVC can achieve high accuracy at much reduced dimensionality and sometimes reveals drastically different sets of relevant variables for different regions.
The recently developed Deep Potential [Phys. Rev. Lett. 120, 143001, 2018] is a powerful method to represent general inter-atomic potentials using deep neural networks. The success of Deep Potential rests on the proper treatment of locality and symmetry properties of each component of the network. In this paper, we leverage its network structure to effectively represent the mapping from the atomic configuration to the electron density in Kohn-Sham density function theory (KS-DFT). By directly targeting at the self-consistent electron density, we demonstrate that the adapted network architecture, called the Deep Density, can effectively represent the electron density as the linear combination of contributions from many local clusters. The network is constructed to satisfy the translation, rotation, and permutation symmetries, and is designed to be transferable to different system sizes. We demonstrate that using a relatively small number of training snapshots, Deep Density achieves excellent performance for one-dimensional insulating and metallic systems, as well as systems with mixed insulating and metallic characters. We also demonstrate its performance for real three-dimensional systems, including small organic molecules, as well as extended systems such as water (up to $512$ molecules) and aluminum (up to $256$ atoms).
Understanding and interpreting how machine learning (ML) models make decisions have been a big challenge. While recent research has proposed various technical approaches to provide some clues as to how an ML model makes individual predictions, they cannot provide users with an ability to inspect a model as a complete entity. In this work, we propose a novel technical approach that augments a Bayesian non-parametric regression mixture model with multiple elastic nets. Using the enhanced mixture model, we can extract generalizable insights for a target model through a global approximation. To demonstrate the utility of our approach, we evaluate it on different ML models in the context of image recognition. The empirical results indicate that our proposed approach not only outperforms the state-of-the-art techniques in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of the target ML models.
It is oftentimes impossible to understand how machine learning models reach a decision. While recent research has proposed various technical approaches to provide some clues as to how a learning model makes individual decisions, they cannot provide users with ability to inspect a learning model as a complete entity. In this work, we propose a new technical approach that augments a Bayesian regression mixture model with multiple elastic nets. Using the enhanced mixture model, we extract explanations for a target model through global approximation. To demonstrate the utility of our approach, we evaluate it on different learning models covering the tasks of text mining and image recognition. Our results indicate that the proposed approach not only outperforms the state-of-the-art technique in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of a learning model.
Because of the limitations of matrix factorization, such as losing spatial structure information, the concept of low-rank tensor factorization (LRTF) has been applied for the recovery of a low dimensional subspace from high dimensional visual data. The low-rank tensor recovery is generally achieved by minimizing the loss function between the observed data and the factorization representation. The loss function is designed in various forms under different noise distribution assumptions, like $L_1$ norm for Laplacian distribution and $L_2$ norm for Gaussian distribution. However, they often fail to tackle the real data which are corrupted by the noise with unknown distribution. In this paper, we propose a generalized weighted low-rank tensor factorization method (GWLRTF) integrated with the idea of noise modelling. This procedure treats the target data as high-order tensor directly and models the noise by a Mixture of Gaussians, which is called MoG GWLRTF. The parameters in the model are estimated under the EM framework and through a new developed algorithm of weighted low-rank tensor factorization. We provide two versions of the algorithm with different tensor factorization operations, i.e., CP factorization and Tucker factorization. Extensive experiments indicate the respective advantages of this two versions in different applications and also demonstrate the effectiveness of MoG GWLRTF compared with other competing methods.
Deep neural networks have proven to be quite effective in a wide variety of machine learning tasks, ranging from improved speech recognition systems to advancing the development of autonomous vehicles. However, despite their superior performance in many applications, these models have been recently shown to be susceptible to a particular type of attack possible through the generation of particular synthetic examples referred to as adversarial samples. These samples are constructed by manipulating real examples from the training data distribution in order to "fool" the original neural model, resulting in misclassification (with high confidence) of previously correctly classified samples. Addressing this weakness is of utmost importance if deep neural architectures are to be applied to critical applications, such as those in the domain of cybersecurity. In this paper, we present an analysis of this fundamental flaw lurking in all neural architectures to uncover limitations of previously proposed defense mechanisms. More importantly, we present a unifying framework for protecting deep neural models using a non-invertible data transformation--developing two adversary-resilient architectures utilizing both linear and nonlinear dimensionality reduction. Empirical results indicate that our framework provides better robustness compared to state-of-art solutions while having negligible degradation in accuracy.