Picture for Yuanzhi Li

Yuanzhi Li

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Add code
Jul 11, 2023
Viaarxiv icon

Length Generalization in Arithmetic Transformers

Add code
Jun 27, 2023
Viaarxiv icon

Textbooks Are All You Need

Add code
Jun 20, 2023
Figure 1 for Textbooks Are All You Need
Figure 2 for Textbooks Are All You Need
Figure 3 for Textbooks Are All You Need
Figure 4 for Textbooks Are All You Need
Viaarxiv icon

Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY

Add code
Jun 14, 2023
Figure 1 for Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY
Figure 2 for Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY
Viaarxiv icon

Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training

Add code
Jun 02, 2023
Viaarxiv icon

Toward Understanding Why Adam Converges Faster Than SGD for Transformers

Add code
May 31, 2023
Figure 1 for Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Figure 2 for Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Figure 3 for Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Figure 4 for Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Viaarxiv icon

SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning

Add code
May 24, 2023
Viaarxiv icon

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Add code
May 24, 2023
Figure 1 for TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Figure 2 for TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Figure 3 for TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Figure 4 for TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Viaarxiv icon

Physics of Language Models: Part 1, Context-Free Grammar

Add code
May 23, 2023
Viaarxiv icon

The probability flow ODE is provably fast

Add code
May 19, 2023
Viaarxiv icon