Alert button
Picture for Eran Malach

Eran Malach

Alert button

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

Add code
Bookmark button
Alert button
Feb 16, 2024
Benjamin L. Edelman, Ezra Edelman, Surbhi Goel, Eran Malach, Nikolaos Tsilivis

Viaarxiv icon

Repeat After Me: Transformers are Better than State Space Models at Copying

Add code
Bookmark button
Alert button
Feb 01, 2024
Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

Viaarxiv icon

Auto-Regressive Next-Token Predictors are Universal Learners

Add code
Bookmark button
Alert button
Sep 13, 2023
Eran Malach

Viaarxiv icon

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

Add code
Bookmark button
Alert button
Sep 07, 2023
Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

Figure 1 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Figure 2 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Figure 3 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Figure 4 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Viaarxiv icon

Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD

Add code
Bookmark button
Alert button
Sep 04, 2023
Etay Livne, Gal Kaplun, Eran Malach, Shai Shalev-Schwatz

Figure 1 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Figure 2 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Figure 3 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Figure 4 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Viaarxiv icon

SubTuning: Efficient Finetuning for Multi-Task Learning

Add code
Bookmark button
Alert button
Feb 14, 2023
Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach

Figure 1 for SubTuning: Efficient Finetuning for Multi-Task Learning
Figure 2 for SubTuning: Efficient Finetuning for Multi-Task Learning
Figure 3 for SubTuning: Efficient Finetuning for Multi-Task Learning
Figure 4 for SubTuning: Efficient Finetuning for Multi-Task Learning
Viaarxiv icon

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Add code
Bookmark button
Alert button
Jul 18, 2022
Boaz Barak, Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

Figure 1 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Figure 2 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Figure 3 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Figure 4 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Viaarxiv icon

Knowledge Distillation: Bad Models Can Be Good Role Models

Add code
Bookmark button
Alert button
Mar 28, 2022
Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

Figure 1 for Knowledge Distillation: Bad Models Can Be Good Role Models
Figure 2 for Knowledge Distillation: Bad Models Can Be Good Role Models
Viaarxiv icon

On the Power of Differentiable Learning versus PAC and SQ Learning

Add code
Bookmark button
Alert button
Aug 09, 2021
Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan Srebro

Figure 1 for On the Power of Differentiable Learning versus PAC and SQ Learning
Viaarxiv icon

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

Add code
Bookmark button
Alert button
Mar 01, 2021
Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro

Figure 1 for Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
Figure 2 for Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
Figure 3 for Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
Figure 4 for Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
Viaarxiv icon