Picture for Konstantin Mishchenko

Konstantin Mishchenko

SIERRA, PSL

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Add code
May 28, 2024
Viaarxiv icon

The Road Less Scheduled

Add code
May 24, 2024
Viaarxiv icon

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Add code
Oct 11, 2023
Figure 1 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 2 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 3 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Figure 4 for When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Viaarxiv icon

Adaptive Proximal Gradient Method for Convex Optimization

Add code
Aug 04, 2023
Viaarxiv icon

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Add code
Jun 09, 2023
Figure 1 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 2 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 3 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Figure 4 for Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Viaarxiv icon

Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity

Add code
May 29, 2023
Figure 1 for Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity
Figure 2 for Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity
Figure 3 for Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity
Viaarxiv icon

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

Add code
May 25, 2023
Figure 1 for DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
Figure 2 for DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
Figure 3 for DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
Figure 4 for DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
Viaarxiv icon

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

Add code
Feb 07, 2023
Figure 1 for Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy
Figure 2 for Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy
Viaarxiv icon

Learning-Rate-Free Learning by D-Adaptation

Add code
Jan 20, 2023
Figure 1 for Learning-Rate-Free Learning by D-Adaptation
Figure 2 for Learning-Rate-Free Learning by D-Adaptation
Figure 3 for Learning-Rate-Free Learning by D-Adaptation
Figure 4 for Learning-Rate-Free Learning by D-Adaptation
Viaarxiv icon

Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes

Add code
Jan 17, 2023
Figure 1 for Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes
Figure 2 for Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes
Viaarxiv icon