Picture for Mikhail Belkin

Mikhail Belkin

Context-Scaling versus Task-Scaling in In-Context Learning

Add code
Oct 16, 2024
Viaarxiv icon

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Add code
Jul 29, 2024
Viaarxiv icon

Average gradient outer product as a mechanism for deep neural collapse

Add code
Feb 21, 2024
Viaarxiv icon

Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination

Add code
Feb 15, 2024
Viaarxiv icon

Linear Recursive Feature Machines provably recover low-rank matrices

Add code
Jan 09, 2024
Viaarxiv icon

On the Nystrom Approximation for Preconditioning in Kernel Machines

Add code
Dec 06, 2023
Viaarxiv icon

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

Add code
Nov 27, 2023
Figure 1 for More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
Figure 2 for More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
Figure 3 for More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
Figure 4 for More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
Viaarxiv icon

Mechanism of feature learning in convolutional neural networks

Add code
Sep 01, 2023
Viaarxiv icon

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

Add code
Jun 07, 2023
Viaarxiv icon

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Add code
Jun 05, 2023
Viaarxiv icon