Picture for Alex Fang

Alex Fang

DatBench: Discriminative, Faithful, and Efficient VLM Evaluations

Add code
Jan 05, 2026
Viaarxiv icon

Luxical: High-Speed Lexical-Dense Text Embeddings

Add code
Dec 11, 2025
Viaarxiv icon

Reusing Pre-Training Data at Test Time is a Compute Multiplier

Add code
Nov 06, 2025
Figure 1 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Figure 2 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Figure 3 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Figure 4 for Reusing Pre-Training Data at Test Time is a Compute Multiplier
Viaarxiv icon

Language Models Improve When Pretraining Data Matches Target Tasks

Add code
Jul 16, 2025
Viaarxiv icon

Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality

Add code
Mar 10, 2025
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Add code
May 29, 2024
Viaarxiv icon

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

Add code
May 19, 2024
Figure 1 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Figure 2 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Figure 3 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Figure 4 for URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Viaarxiv icon

Language models scale reliably with over-training and on downstream tasks

Add code
Mar 13, 2024
Figure 1 for Language models scale reliably with over-training and on downstream tasks
Figure 2 for Language models scale reliably with over-training and on downstream tasks
Figure 3 for Language models scale reliably with over-training and on downstream tasks
Figure 4 for Language models scale reliably with over-training and on downstream tasks
Viaarxiv icon

Data Filtering Networks

Add code
Oct 02, 2023
Figure 1 for Data Filtering Networks
Figure 2 for Data Filtering Networks
Figure 3 for Data Filtering Networks
Figure 4 for Data Filtering Networks
Viaarxiv icon