Picture for Sharan Narang

Sharan Narang

Jack

Training LLMs with Fault Tolerant HSDP on 100,000 GPUs

Add code
Jan 30, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Compute Optimal Scaling of Skills: Knowledge vs Reasoning

Add code
Mar 13, 2025
Viaarxiv icon

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks

Add code
Feb 24, 2025
Figure 1 for Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Figure 2 for Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Figure 3 for Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Figure 4 for Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Viaarxiv icon

Law of the Weakest Link: Cross Capabilities of Large Language Models

Add code
Sep 30, 2024
Figure 1 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 2 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 3 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 4 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Quantifying Variance in Evaluation Benchmarks

Add code
Jun 14, 2024
Figure 1 for Quantifying Variance in Evaluation Benchmarks
Figure 2 for Quantifying Variance in Evaluation Benchmarks
Figure 3 for Quantifying Variance in Evaluation Benchmarks
Figure 4 for Quantifying Variance in Evaluation Benchmarks
Viaarxiv icon

Effective Long-Context Scaling of Foundation Models

Add code
Sep 27, 2023
Figure 1 for Effective Long-Context Scaling of Foundation Models
Figure 2 for Effective Long-Context Scaling of Foundation Models
Figure 3 for Effective Long-Context Scaling of Foundation Models
Figure 4 for Effective Long-Context Scaling of Foundation Models
Viaarxiv icon

Llama 2: Open Foundation and Fine-Tuned Chat Models

Add code
Jul 19, 2023
Figure 1 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Figure 2 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Figure 3 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Figure 4 for Llama 2: Open Foundation and Fine-Tuned Chat Models
Viaarxiv icon

A Theory on Adam Instability in Large-Scale Machine Learning

Add code
Apr 25, 2023
Figure 1 for A Theory on Adam Instability in Large-Scale Machine Learning
Figure 2 for A Theory on Adam Instability in Large-Scale Machine Learning
Figure 3 for A Theory on Adam Instability in Large-Scale Machine Learning
Figure 4 for A Theory on Adam Instability in Large-Scale Machine Learning
Viaarxiv icon