Picture for Yuanzhi Li

Yuanzhi Li

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Add code
Aug 29, 2024
Figure 1 for Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Figure 2 for Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Figure 3 for Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Figure 4 for Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Viaarxiv icon

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Add code
Jul 29, 2024
Figure 1 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 2 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 3 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Figure 4 for Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Viaarxiv icon

How Does Overparameterization Affect Features?

Add code
Jul 01, 2024
Figure 1 for How Does Overparameterization Affect Features?
Figure 2 for How Does Overparameterization Affect Features?
Figure 3 for How Does Overparameterization Affect Features?
Figure 4 for How Does Overparameterization Affect Features?
Viaarxiv icon

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Add code
Apr 23, 2024
Figure 1 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 2 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 3 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 4 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Viaarxiv icon

AgentKit: Flow Engineering with Graphs, not Coding

Add code
Apr 17, 2024
Figure 1 for AgentKit: Flow Engineering with Graphs, not Coding
Figure 2 for AgentKit: Flow Engineering with Graphs, not Coding
Figure 3 for AgentKit: Flow Engineering with Graphs, not Coding
Figure 4 for AgentKit: Flow Engineering with Graphs, not Coding
Viaarxiv icon

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

Add code
Apr 09, 2024
Viaarxiv icon

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Add code
Apr 08, 2024
Viaarxiv icon

Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs

Add code
Mar 23, 2024
Figure 1 for Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
Figure 2 for Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
Figure 3 for Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
Figure 4 for Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
Viaarxiv icon

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

Add code
Mar 01, 2024
Figure 1 for Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
Figure 2 for Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
Figure 3 for Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
Figure 4 for Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
Viaarxiv icon

Provably learning a multi-head attention layer

Add code
Feb 06, 2024
Viaarxiv icon