Alert button
Picture for Kushal Tirumala

Kushal Tirumala

Alert button

The Unreasonable Ineffectiveness of the Deeper Layers

Add code
Bookmark button
Alert button
Mar 26, 2024
Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

Viaarxiv icon

Effective pruning of web-scale datasets based on complexity of concept clusters

Add code
Bookmark button
Alert button
Jan 09, 2024
Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos

Viaarxiv icon

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

Add code
Bookmark button
Alert button
Dec 05, 2023
Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani

Viaarxiv icon

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Add code
Bookmark button
Alert button
Aug 23, 2023
Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari S. Morcos

Figure 1 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 2 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 3 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 4 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Viaarxiv icon

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Add code
Bookmark button
Alert button
Mar 22, 2023
Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

Figure 1 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 2 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 3 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 4 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Viaarxiv icon

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Add code
Bookmark button
Alert button
May 22, 2022
Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, Armen Aghajanyan

Figure 1 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Figure 2 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Figure 3 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Figure 4 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Viaarxiv icon

Investigating Generalization by Controlling Normalized Margin

Add code
Bookmark button
Alert button
May 08, 2022
Alexander Farhang, Jeremy Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue

Figure 1 for Investigating Generalization by Controlling Normalized Margin
Figure 2 for Investigating Generalization by Controlling Normalized Margin
Figure 3 for Investigating Generalization by Controlling Normalized Margin
Figure 4 for Investigating Generalization by Controlling Normalized Margin
Viaarxiv icon

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

Add code
Bookmark button
Alert button
Apr 05, 2022
Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela

Figure 1 for Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
Figure 2 for Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
Figure 3 for Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
Figure 4 for Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
Viaarxiv icon