Picture for Tristan Thrush

Tristan Thrush

Synthetic Data for any Differentiable Target

Add code
Apr 09, 2026
Viaarxiv icon

Nearest Neighbor Normalization Improves Multimodal Retrieval

Add code
Oct 31, 2024
Figure 1 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 2 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 3 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 4 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Viaarxiv icon

Improving Pretraining Data Using Perplexity Correlations

Add code
Sep 09, 2024
Figure 1 for Improving Pretraining Data Using Perplexity Correlations
Figure 2 for Improving Pretraining Data Using Perplexity Correlations
Figure 3 for Improving Pretraining Data Using Perplexity Correlations
Figure 4 for Improving Pretraining Data Using Perplexity Correlations
Viaarxiv icon

ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation

Add code
Feb 07, 2024
Figure 1 for ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
Figure 2 for ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
Figure 3 for ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
Figure 4 for ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
Viaarxiv icon

I am a Strange Dataset: Metalinguistic Tests for Language Models

Add code
Jan 10, 2024
Figure 1 for I am a Strange Dataset: Metalinguistic Tests for Language Models
Figure 2 for I am a Strange Dataset: Metalinguistic Tests for Language Models
Figure 3 for I am a Strange Dataset: Metalinguistic Tests for Language Models
Figure 4 for I am a Strange Dataset: Metalinguistic Tests for Language Models
Viaarxiv icon

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Add code
Jun 28, 2023
Figure 1 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Figure 2 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Figure 3 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Figure 4 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Viaarxiv icon

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Figure 1 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 2 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 3 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 4 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Viaarxiv icon

Measuring Data

Add code
Dec 09, 2022
Figure 1 for Measuring Data
Figure 2 for Measuring Data
Figure 3 for Measuring Data
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Add code
Oct 06, 2022
Figure 1 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 2 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 3 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Viaarxiv icon