Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Davide Gallon

SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

May 14, 2025

Julian Kranz, Davide Gallon, Steffen Dereich, Arnulf Jentzen

Abstract:We study gradient flows for loss landscapes of fully connected feed forward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperbolic tangent, softplus or GELU function. We prove that the gradient flow either converges to a critical point or diverges to infinity while the loss converges to an asymptotic critical value. Moreover, we prove the existence of a threshold $\varepsilon>0$ such that the loss value of any gradient flow initialized at most $\varepsilon$ above the optimal level converges to it. For polynomial target functions and sufficiently big architecture and data set, we prove that the optimal loss value is zero and can only be realized asymptotically. From this setting, we deduce our main result that any gradient flow with sufficiently good initialization diverges to infinity. Our proof heavily relies on the geometry of o-minimal structures. We confirm these theoretical findings with numerical experiments and extend our investigation to real-world scenarios, where we observe an analogous behavior.

* 27 pages, 4 figures

Via

Access Paper or Ask Questions

An overview of diffusion models for generative artificial intelligence

Dec 02, 2024

Davide Gallon, Arnulf Jentzen, Philippe von Wurstemberger

Figure 1 for An overview of diffusion models for generative artificial intelligence

Figure 2 for An overview of diffusion models for generative artificial intelligence

Figure 3 for An overview of diffusion models for generative artificial intelligence

Figure 4 for An overview of diffusion models for generative artificial intelligence

Abstract:This article provides a mathematically rigorous introduction to denoising diffusion probabilistic models (DDPMs), sometimes also referred to as diffusion probabilistic models or diffusion models, for generative artificial intelligence. We provide a detailed basic mathematical framework for DDPMs and explain the main ideas behind training and generation procedures. In this overview article we also review selected extensions and improvements of the basic framework from the literature such as improved DDPMs, denoising diffusion implicit models, classifier-free diffusion guidance models, and latent diffusion models.

* 56 pages, 5 figures

Via

Access Paper or Ask Questions