Picture for Tengyu Ma

Tengyu Ma

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Add code
Jul 23, 2023
Figure 1 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Figure 2 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Figure 3 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Figure 4 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Viaarxiv icon

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

Add code
Jul 07, 2023
Viaarxiv icon

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

Add code
Jun 28, 2023
Viaarxiv icon

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

Add code
Jun 22, 2023
Figure 1 for The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Figure 2 for The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Figure 3 for The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Figure 4 for The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Viaarxiv icon

Large Language Models as Tool Makers

Add code
May 26, 2023
Viaarxiv icon

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

Add code
May 24, 2023
Viaarxiv icon

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Add code
May 23, 2023
Viaarxiv icon

Symbol tuning improves in-context learning in language models

Add code
May 15, 2023
Figure 1 for Symbol tuning improves in-context learning in language models
Figure 2 for Symbol tuning improves in-context learning in language models
Figure 3 for Symbol tuning improves in-context learning in language models
Figure 4 for Symbol tuning improves in-context learning in language models
Viaarxiv icon

Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields

Add code
Apr 29, 2023
Viaarxiv icon

Larger language models do in-context learning differently

Add code
Mar 08, 2023
Figure 1 for Larger language models do in-context learning differently
Figure 2 for Larger language models do in-context learning differently
Figure 3 for Larger language models do in-context learning differently
Figure 4 for Larger language models do in-context learning differently
Viaarxiv icon