Picture for Aaron Defazio

Aaron Defazio

Alice

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Add code
Dec 18, 2025
Figure 1 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 2 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 3 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 4 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Viaarxiv icon

PARQ: Piecewise-Affine Regularized Quantization

Add code
Mar 19, 2025
Figure 1 for PARQ: Piecewise-Affine Regularized Quantization
Figure 2 for PARQ: Piecewise-Affine Regularized Quantization
Figure 3 for PARQ: Piecewise-Affine Regularized Quantization
Figure 4 for PARQ: Piecewise-Affine Regularized Quantization
Viaarxiv icon

The Road Less Scheduled

Add code
May 24, 2024
Viaarxiv icon

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Add code
Mar 06, 2024
Figure 1 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 2 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 3 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Figure 4 for Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Viaarxiv icon

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Add code
Oct 11, 2023
Viaarxiv icon

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Add code
Jun 09, 2023
Viaarxiv icon

Mechanic: A Learning Rate Tuner

Add code
Jun 02, 2023
Figure 1 for Mechanic: A Learning Rate Tuner
Figure 2 for Mechanic: A Learning Rate Tuner
Figure 3 for Mechanic: A Learning Rate Tuner
Figure 4 for Mechanic: A Learning Rate Tuner
Viaarxiv icon

MoMo: Momentum Models for Adaptive Learning Rates

Add code
May 12, 2023
Figure 1 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 2 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 3 for MoMo: Momentum Models for Adaptive Learning Rates
Figure 4 for MoMo: Momentum Models for Adaptive Learning Rates
Viaarxiv icon

Learning-Rate-Free Learning by D-Adaptation

Add code
Jan 20, 2023
Figure 1 for Learning-Rate-Free Learning by D-Adaptation
Figure 2 for Learning-Rate-Free Learning by D-Adaptation
Figure 3 for Learning-Rate-Free Learning by D-Adaptation
Figure 4 for Learning-Rate-Free Learning by D-Adaptation
Viaarxiv icon

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

Add code
Jun 14, 2022
Figure 1 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 2 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 3 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Figure 4 for Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Viaarxiv icon