Alert button
Picture for Aaron Defazio

Aaron Defazio

Alert button

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Add code
Bookmark button
Alert button
Mar 06, 2024
Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

Figure 1 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 2 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 3 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 4 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Viaarxiv icon

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Add code
Bookmark button
Alert button
Oct 11, 2023
Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

Figure 1 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 2 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 3 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 4 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Viaarxiv icon

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Add code
Bookmark button
Alert button
Jun 09, 2023
Konstantin Mishchenko, Aaron Defazio

Figure 1 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 2 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 3 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 4 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Viaarxiv icon

Mechanic: A Learning Rate Tuner

Add code
Bookmark button
Alert button
Jun 02, 2023
Ashok Cutkosky, Aaron Defazio, Harsh Mehta

Figure 1 for Mechanic: A Learning Rate Tuner
Figure 2 for Mechanic: A Learning Rate Tuner
Figure 3 for Mechanic: A Learning Rate Tuner
Figure 4 for Mechanic: A Learning Rate Tuner
Viaarxiv icon

MoMo: Momentum Models for Adaptive Learning Rates

Add code
Bookmark button
Alert button
May 12, 2023
Fabian Schaipp, Ruben Ohana, Michael Eickenberg, Aaron Defazio, Robert M. Gower

Figure 1 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 2 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 3 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 4 for MoMo: Momentum Models for Adaptive Learning Rates
Viaarxiv icon

Learning-Rate-Free Learning by D-Adaptation

Add code
Bookmark button
Alert button
Jan 20, 2023
Aaron Defazio, Konstantin Mishchenko

Figure 1 for Learning-Rate-Free Learning by D-Adaptation
Figure 2 for Learning-Rate-Free Learning by D-Adaptation
Figure 3 for Learning-Rate-Free Learning by D-Adaptation
Figure 4 for Learning-Rate-Free Learning by D-Adaptation
Viaarxiv icon

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

Add code
Bookmark button
Alert button
Jun 14, 2022
Aaron Defazio, Baoyu Zhou, Lin Xiao

Figure 1 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 2 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 3 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 4 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Viaarxiv icon

Stochastic Polyak Stepsize with a Moving Target

Add code
Bookmark button
Alert button
Jun 22, 2021
Robert M. Gower, Aaron Defazio, Michael Rabbat

Figure 1 for Stochastic Polyak Stepsize with a Moving Target
Figure 2 for Stochastic Polyak Stepsize with a Moving Target
Figure 3 for Stochastic Polyak Stepsize with a Moving Target
Figure 4 for Stochastic Polyak Stepsize with a Moving Target
Viaarxiv icon

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Add code
Bookmark button
Alert button
Jan 26, 2021
Aaron Defazio, Samy Jelassi

Figure 1 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Figure 2 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Figure 3 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Figure 4 for Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Viaarxiv icon

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

Add code
Bookmark button
Alert button
Oct 20, 2020
Samy Jelassi, Aaron Defazio

Figure 1 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Figure 2 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Figure 3 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Figure 4 for Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Viaarxiv icon