Abstract:We propose a new simple architecture, Zeta Neural Networks (ZeNNs), in order to overcome several shortcomings of standard multi-layer perceptrons (MLPs). Namely, in the large width limit, MLPs are non-parametric, they do not have a well-defined pointwise limit, they lose non-Gaussian attributes and become unable to perform feature learning; moreover, finite width MLPs perform poorly in learning high frequencies. The new ZeNN architecture is inspired by three simple principles from harmonic analysis: i) Enumerate the perceptons and introduce a non-learnable weight to enforce convergence; ii) Introduce a scaling (or frequency) factor; iii) Choose activation functions that lead to near orthogonal systems. We will show that these ideas allow us to fix the referred shortcomings of MLPs. In fact, in the infinite width limit, ZeNNs converge pointwise, they exhibit a rich asymptotic structure beyond Gaussianity, and perform feature learning. Moreover, when appropriate activation functions are chosen, (finite width) ZeNNs excel at learning high-frequency features of functions with low dimensional domains.
Abstract:The Neural Tangent Kernel (NTK) has emerged as a fundamental concept in the study of wide Neural Networks. In particular, it is known that the positivity of the NTK is directly related to the memorization capacity of sufficiently wide networks, i.e., to the possibility of reaching zero loss in training, via gradient descent. Here we will improve on previous works and obtain a sharp result concerning the positivity of the NTK of feedforward networks of any depth. More precisely, we will show that, for any non-polynomial activation function, the NTK is strictly positive definite. Our results are based on a novel characterization of polynomial functions which is of independent interest.