Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eduardo Sontag

On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks

Nov 12, 2025

Arthur Castello Branco de Oliveira, Dhruv Jatkar, Eduardo Sontag

Figure 1 for On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks

Figure 2 for On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks

Figure 3 for On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks

Abstract:This paper investigates how the compositional structure of neural networks shapes their optimization landscape and training dynamics. We analyze the gradient flow associated with overparameterized optimization problems, which can be interpreted as training a neural network with linear activations. Remarkably, we show that the global convergence properties can be derived for any cost function that is proper and real analytic. We then specialize the analysis to scalar-valued cost functions, where the geometry of the landscape can be fully characterized. In this setting, we demonstrate that key structural features -- such as the location and stability of saddle points -- are universal across all admissible costs, depending solely on the overparameterized representation rather than on problem-specific details. Moreover, we show that convergence can be arbitrarily accelerated depending on the initialization, as measured by an imbalance metric introduced in this work. Finally, we discuss how these insights may generalize to neural networks with sigmoidal activations, showing through a simple example which geometric and dynamical properties persist beyond the linear case.

Via

Access Paper or Ask Questions

Exact Recovery Guarantees for Parameterized Non-linear System Identification Problem under Adversarial Attacks

Aug 30, 2024

Haixiang Zhang, Baturalp Yalcin, Javad Lavaei, Eduardo Sontag

Abstract:In this work, we study the system identification problem for parameterized non-linear systems using basis functions under adversarial attacks. Motivated by the LASSO-type estimators, we analyze the exact recovery property of a non-smooth estimator, which is generated by solving an embedded $\ell_1$-loss minimization problem. First, we derive necessary and sufficient conditions for the well-specifiedness of the estimator and the uniqueness of global solutions to the underlying optimization problem. Next, we provide exact recovery guarantees for the estimator under two different scenarios of boundedness and Lipschitz continuity of the basis functions. The non-asymptotic exact recovery is guaranteed with high probability, even when there are more severely corrupted data than clean data. Finally, we numerically illustrate the validity of our theory. This is the first study on the sample complexity analysis of a non-smooth estimator for the non-linear system identification problem.

* 33 pages

Via

Access Paper or Ask Questions

Learning Recurrent Neural Net Models of Nonlinear Systems

Nov 20, 2020

Joshua Hanson, Maxim Raginsky, Eduardo Sontag

Abstract:We consider the following learning problem: Given sample pairs of input and output signals generated by an unknown nonlinear system (which is not assumed to be causal or time-invariant), we wish to find a continuous-time recurrent neural net with hyperbolic tangent activation function that approximately reproduces the underlying i/o behavior with high confidence. Leveraging earlier work concerned with matching output derivatives up to a given finite order, we reformulate the learning problem in familiar system-theoretic language and derive quantitative guarantees on the sup-norm risk of the learned model in terms of the number of neurons, the sample size, the number of derivatives being matched, and the regularity properties of the inputs, the outputs, and the unknown i/o map.

* 14 pages

Via

Access Paper or Ask Questions