Picture for Daniel Soudry

Daniel Soudry

Optimal Rates in Continual Linear Regression via Increasing Regularization

Add code
Jun 06, 2025
Viaarxiv icon

FP4 All the Way: Fully Quantized Training of LLMs

Add code
May 25, 2025
Viaarxiv icon

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

Add code
May 25, 2025
Viaarxiv icon

PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training

Add code
May 23, 2025
Viaarxiv icon

Better Rates for Random Task Orderings in Continual Linear Models

Add code
Apr 06, 2025
Viaarxiv icon

Provable Tempered Overfitting of Minimal Nets and Typical Nets

Add code
Oct 24, 2024
Viaarxiv icon

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks

Add code
Oct 02, 2024
Figure 1 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Figure 2 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Figure 3 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Figure 4 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Viaarxiv icon

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Add code
Jun 10, 2024
Viaarxiv icon

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

Add code
Feb 09, 2024
Figure 1 for How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
Figure 2 for How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
Figure 3 for How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
Viaarxiv icon

Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators

Add code
Jan 25, 2024
Figure 1 for Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
Figure 2 for Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
Figure 3 for Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
Figure 4 for Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
Viaarxiv icon