Picture for Eran Malach

Eran Malach

Universal Length Generalization with Turing Programs

Add code
Jul 03, 2024
Figure 1 for Universal Length Generalization with Turing Programs
Figure 2 for Universal Length Generalization with Turing Programs
Figure 3 for Universal Length Generalization with Turing Programs
Figure 4 for Universal Length Generalization with Turing Programs
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon

Transcendence: Generative Models Can Outperform The Experts That Train Them

Add code
Jun 17, 2024
Figure 1 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Figure 2 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Figure 3 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Figure 4 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Viaarxiv icon

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

Add code
Feb 16, 2024
Viaarxiv icon

Repeat After Me: Transformers are Better than State Space Models at Copying

Add code
Feb 01, 2024
Figure 1 for Repeat After Me: Transformers are Better than State Space Models at Copying
Figure 2 for Repeat After Me: Transformers are Better than State Space Models at Copying
Figure 3 for Repeat After Me: Transformers are Better than State Space Models at Copying
Figure 4 for Repeat After Me: Transformers are Better than State Space Models at Copying
Viaarxiv icon

Auto-Regressive Next-Token Predictors are Universal Learners

Add code
Sep 13, 2023
Viaarxiv icon

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

Add code
Sep 07, 2023
Figure 1 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Figure 2 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Figure 3 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Figure 4 for Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Viaarxiv icon

Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD

Add code
Sep 04, 2023
Figure 1 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Figure 2 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Figure 3 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Figure 4 for Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Viaarxiv icon

SubTuning: Efficient Finetuning for Multi-Task Learning

Add code
Feb 14, 2023
Figure 1 for SubTuning: Efficient Finetuning for Multi-Task Learning
Figure 2 for SubTuning: Efficient Finetuning for Multi-Task Learning
Figure 3 for SubTuning: Efficient Finetuning for Multi-Task Learning
Figure 4 for SubTuning: Efficient Finetuning for Multi-Task Learning
Viaarxiv icon

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Add code
Jul 18, 2022
Figure 1 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Figure 2 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Figure 3 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Figure 4 for Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Viaarxiv icon