Picture for Alex Fang

Alex Fang

Language Models Improve When Pretraining Data Matches Target Tasks

Add code
Jul 16, 2025
Viaarxiv icon

Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality

Add code
Mar 10, 2025
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Add code
May 29, 2024
Viaarxiv icon

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

Add code
May 19, 2024
Figure 1 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Figure 2 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Figure 3 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Figure 4 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Viaarxiv icon

Language models scale reliably with over-training and on downstream tasks

Add code
Mar 13, 2024
Figure 1 for Language models scale reliably with over-training and on downstream tasks
Figure 2 for Language models scale reliably with over-training and on downstream tasks
Figure 3 for Language models scale reliably with over-training and on downstream tasks
Figure 4 for Language models scale reliably with over-training and on downstream tasks
Viaarxiv icon

Data Filtering Networks

Add code
Oct 02, 2023
Figure 1 for Data Filtering Networks
Figure 2 for Data Filtering Networks
Figure 3 for Data Filtering Networks
Figure 4 for Data Filtering Networks
Viaarxiv icon

Neural Priming for Sample-Efficient Adaptation

Add code
Jun 24, 2023
Figure 1 for Neural Priming for Sample-Efficient Adaptation
Figure 2 for Neural Priming for Sample-Efficient Adaptation
Figure 3 for Neural Priming for Sample-Efficient Adaptation
Figure 4 for Neural Priming for Sample-Efficient Adaptation
Viaarxiv icon

DataComp: In search of the next generation of multimodal datasets

Add code
May 03, 2023
Viaarxiv icon

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Add code
Apr 14, 2023
Viaarxiv icon