Picture for Noam Levi

Noam Levi

The Implicit Bias of Logit Regularization

Add code
Feb 13, 2026
Viaarxiv icon

More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

Add code
Jan 29, 2026
Viaarxiv icon

Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model

Add code
Jan 07, 2026
Viaarxiv icon

A Simple Model of Inference Scaling Laws

Add code
Oct 21, 2024
Figure 1 for A Simple Model of Inference Scaling Laws
Figure 2 for A Simple Model of Inference Scaling Laws
Figure 3 for A Simple Model of Inference Scaling Laws
Figure 4 for A Simple Model of Inference Scaling Laws
Viaarxiv icon

Grokking at the Edge of Linear Separability

Add code
Oct 06, 2024
Figure 1 for Grokking at the Edge of Linear Separability
Figure 2 for Grokking at the Edge of Linear Separability
Figure 3 for Grokking at the Edge of Linear Separability
Figure 4 for Grokking at the Edge of Linear Separability
Viaarxiv icon

Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets

Add code
May 28, 2024
Figure 1 for Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
Figure 2 for Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
Figure 3 for Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
Figure 4 for Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
Viaarxiv icon

Decoupled Weight Decay for Any $p$ Norm

Add code
Apr 16, 2024
Figure 1 for Decoupled Weight Decay for Any $p$ Norm
Figure 2 for Decoupled Weight Decay for Any $p$ Norm
Figure 3 for Decoupled Weight Decay for Any $p$ Norm
Figure 4 for Decoupled Weight Decay for Any $p$ Norm
Viaarxiv icon

Measuring Sharpness in Grokking

Add code
Feb 14, 2024
Figure 1 for Measuring Sharpness in Grokking
Figure 2 for Measuring Sharpness in Grokking
Figure 3 for Measuring Sharpness in Grokking
Figure 4 for Measuring Sharpness in Grokking
Viaarxiv icon

The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence

Add code
Nov 02, 2023
Figure 1 for The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Figure 2 for The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Figure 3 for The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Figure 4 for The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Viaarxiv icon

Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding

Add code
Oct 25, 2023
Figure 1 for Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
Figure 2 for Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
Figure 3 for Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
Figure 4 for Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
Viaarxiv icon