Picture for Samuel Horváth

Samuel Horváth

DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models

Add code
May 28, 2025
Viaarxiv icon

Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise

Add code
May 27, 2025
Viaarxiv icon

Fishing For Cheap And Efficient Pruners At Initialization

Add code
Feb 17, 2025
Viaarxiv icon

Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

Add code
Jan 08, 2025
Viaarxiv icon

Generalizing in Net-Zero Microgrids: A Study with Federated PPO and TRPO

Add code
Dec 30, 2024
Viaarxiv icon

Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization

Add code
Dec 03, 2024
Viaarxiv icon

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Add code
Nov 12, 2024
Viaarxiv icon

Collaborative and Efficient Personalization with Mixtures of Adaptors

Add code
Oct 04, 2024
Figure 1 for Collaborative and Efficient Personalization with Mixtures of Adaptors
Figure 2 for Collaborative and Efficient Personalization with Mixtures of Adaptors
Figure 3 for Collaborative and Efficient Personalization with Mixtures of Adaptors
Figure 4 for Collaborative and Efficient Personalization with Mixtures of Adaptors
Viaarxiv icon

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

Add code
Jun 18, 2024
Figure 1 for Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Figure 2 for Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Figure 3 for Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Figure 4 for Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Viaarxiv icon

Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed

Add code
Jun 06, 2024
Figure 1 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 2 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 3 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 4 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Viaarxiv icon