Alert button
Picture for Aaron Defazio

Aaron Defazio

Alert button

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Mar 06, 2024
Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

Figure 1 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 2 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 3 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 4 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Viaarxiv icon

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Oct 11, 2023
Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

Figure 1 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 2 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 3 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 4 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Viaarxiv icon

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Jun 09, 2023
Konstantin Mishchenko, Aaron Defazio

Figure 1 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 2 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 3 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 4 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Viaarxiv icon

Mechanic: A Learning Rate Tuner

Jun 02, 2023
Ashok Cutkosky, Aaron Defazio, Harsh Mehta

Figure 1 for Mechanic: A Learning Rate Tuner
Figure 2 for Mechanic: A Learning Rate Tuner
Figure 3 for Mechanic: A Learning Rate Tuner
Figure 4 for Mechanic: A Learning Rate Tuner
Viaarxiv icon

MoMo: Momentum Models for Adaptive Learning Rates

May 12, 2023
Fabian Schaipp, Ruben Ohana, Michael Eickenberg, Aaron Defazio, Robert M. Gower

Figure 1 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 2 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 3 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 4 for MoMo: Momentum Models for Adaptive Learning Rates
Viaarxiv icon

Learning-Rate-Free Learning by D-Adaptation

Jan 20, 2023
Aaron Defazio, Konstantin Mishchenko

Figure 1 for Learning-Rate-Free Learning by D-Adaptation
Figure 2 for Learning-Rate-Free Learning by D-Adaptation
Figure 3 for Learning-Rate-Free Learning by D-Adaptation
Figure 4 for Learning-Rate-Free Learning by D-Adaptation
Viaarxiv icon

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

Jun 14, 2022
Aaron Defazio, Baoyu Zhou, Lin Xiao

Figure 1 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 2 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 3 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 4 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Viaarxiv icon

Stochastic Polyak Stepsize with a Moving Target

Jun 22, 2021
Robert M. Gower, Aaron Defazio, Michael Rabbat

Figure 1 for Stochastic Polyak Stepsize with a Moving Target
Figure 2 for Stochastic Polyak Stepsize with a Moving Target
Figure 3 for Stochastic Polyak Stepsize with a Moving Target
Figure 4 for Stochastic Polyak Stepsize with a Moving Target
Viaarxiv icon

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Jan 26, 2021
Aaron Defazio, Samy Jelassi

Figure 1 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Figure 2 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Figure 3 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Figure 4 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Viaarxiv icon

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

Oct 20, 2020
Samy Jelassi, Aaron Defazio

Figure 1 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Figure 2 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Figure 3 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Figure 4 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Viaarxiv icon