Alert button
Picture for Frederik Kunstner

Frederik Kunstner

Alert button

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Add code
Bookmark button
Alert button
Feb 29, 2024
Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti

Viaarxiv icon

Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking

Add code
Bookmark button
Alert button
Jun 05, 2023
Frederik Kunstner, Victor S. Portella, Mark Schmidt, Nick Harvey

Figure 1 for Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
Figure 2 for Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
Figure 3 for Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
Figure 4 for Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
Viaarxiv icon

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be

Add code
Bookmark button
Alert button
Apr 27, 2023
Frederik Kunstner, Jacques Chen, Jonathan Wilder Lavington, Mark Schmidt

Figure 1 for Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Figure 2 for Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Figure 3 for Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Figure 4 for Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Viaarxiv icon

Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Add code
Bookmark button
Alert button
Nov 12, 2021
Rémi Le Priol, Frederik Kunstner, Damien Scieur, Simon Lacoste-Julien

Figure 1 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Figure 2 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Figure 3 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Figure 4 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Viaarxiv icon

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent

Add code
Bookmark button
Alert button
Nov 02, 2020
Frederik Kunstner, Raunak Kumar, Mark Schmidt

Figure 1 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Figure 2 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Figure 3 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Figure 4 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Viaarxiv icon

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

Add code
Bookmark button
Alert button
Jun 11, 2020
Sharan Vaswani, Frederik Kunstner, Issam Laradji, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien

Figure 1 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Figure 2 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Figure 3 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Figure 4 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Viaarxiv icon

BackPACK: Packing more into backprop

Add code
Bookmark button
Alert button
Feb 15, 2020
Felix Dangel, Frederik Kunstner, Philipp Hennig

Figure 1 for BackPACK: Packing more into backprop
Figure 2 for BackPACK: Packing more into backprop
Figure 3 for BackPACK: Packing more into backprop
Figure 4 for BackPACK: Packing more into backprop
Viaarxiv icon

Limitations of the Empirical Fisher Approximation

Add code
Bookmark button
Alert button
May 29, 2019
Frederik Kunstner, Lukas Balles, Philipp Hennig

Figure 1 for Limitations of the Empirical Fisher Approximation
Figure 2 for Limitations of the Empirical Fisher Approximation
Figure 3 for Limitations of the Empirical Fisher Approximation
Figure 4 for Limitations of the Empirical Fisher Approximation
Viaarxiv icon

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

Add code
Bookmark button
Alert button
Nov 11, 2018
Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

Figure 1 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Figure 2 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Figure 3 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Figure 4 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Viaarxiv icon