Picture for Dirk Groeneveld

Dirk Groeneveld

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

Paloma: A Benchmark for Evaluating Language Model Fit

Add code
Dec 16, 2023
Figure 1 for Paloma: A Benchmark for Evaluating Language Model Fit
Figure 2 for Paloma: A Benchmark for Evaluating Language Model Fit
Figure 3 for Paloma: A Benchmark for Evaluating Language Model Fit
Figure 4 for Paloma: A Benchmark for Evaluating Language Model Fit
Viaarxiv icon

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Add code
Dec 15, 2023
Viaarxiv icon

What's In My Big Data?

Add code
Oct 31, 2023
Figure 1 for What's In My Big Data?
Figure 2 for What's In My Big Data?
Figure 3 for What's In My Big Data?
Figure 4 for What's In My Big Data?
Viaarxiv icon

Large Language Model Distillation Doesn't Need a Teacher

Add code
May 24, 2023
Figure 1 for Large Language Model Distillation Doesn't Need a Teacher
Figure 2 for Large Language Model Distillation Doesn't Need a Teacher
Figure 3 for Large Language Model Distillation Doesn't Need a Teacher
Figure 4 for Large Language Model Distillation Doesn't Need a Teacher
Viaarxiv icon

Continued Pretraining for Better Zero- and Few-Shot Promptability

Add code
Oct 19, 2022
Figure 1 for Continued Pretraining for Better Zero- and Few-Shot Promptability
Figure 2 for Continued Pretraining for Better Zero- and Few-Shot Promptability
Figure 3 for Continued Pretraining for Better Zero- and Few-Shot Promptability
Figure 4 for Continued Pretraining for Better Zero- and Few-Shot Promptability
Viaarxiv icon

Documenting the English Colossal Clean Crawled Corpus

Add code
Apr 18, 2021
Figure 1 for Documenting the English Colossal Clean Crawled Corpus
Figure 2 for Documenting the English Colossal Clean Crawled Corpus
Figure 3 for Documenting the English Colossal Clean Crawled Corpus
Figure 4 for Documenting the English Colossal Clean Crawled Corpus
Viaarxiv icon

A Simple Yet Strong Pipeline for HotpotQA

Add code
Apr 14, 2020
Figure 1 for A Simple Yet Strong Pipeline for HotpotQA
Figure 2 for A Simple Yet Strong Pipeline for HotpotQA
Figure 3 for A Simple Yet Strong Pipeline for HotpotQA
Figure 4 for A Simple Yet Strong Pipeline for HotpotQA
Viaarxiv icon