Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Massimiliano Datres

The Price of Robustness: Stable Classifiers Need Overparameterization

Mar 03, 2026

Jonas von Berg, Adalbert Fono, Massimiliano Datres, Sohir Maskey, Gitta Kutyniok

Abstract:The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a generalization bound for finite function classes that improves inversely with class stability, defined as the expected distance to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion of robustness, we derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke beyond smoothness assumptions to discontinuous functions. In particular, any interpolating model with $p \approx n$ parameters on $n$ data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.

* In Proceedings of the Fourteenth International Conference on Learning Representations (ICLR), 2026
* 29 pages, 9 figures. Accepted at ICLR 2026

Via

Access Paper or Ask Questions

A Two-Scale Complexity Measure for Deep Learning Models

Jan 17, 2024

Massimiliano Datres, Gian Paolo Leonardi, Alessio Figalli, David Sutter

Abstract:We introduce a novel capacity measure 2sED for statistical models based on the effective dimension. The new quantity provably bounds the generalization error under mild assumptions on the model. Furthermore, simulations on standard data sets and popular model architectures show that 2sED correlates well with the training error. For Markovian models, we show how to efficiently approximate 2sED from below through a layerwise iterative approach, which allows us to tackle deep learning models with a large number of parameters. Simulation results suggest that the approximation is good for different prominent models and data sets.

Via

Access Paper or Ask Questions