Picture for Yanai Elazar

Yanai Elazar

Detection and Measurement of Syntactic Templates in Generated Text

Add code
Jun 28, 2024
Figure 1 for Detection and Measurement of Syntactic Templates in Generated Text
Figure 2 for Detection and Measurement of Syntactic Templates in Generated Text
Figure 3 for Detection and Measurement of Syntactic Templates in Generated Text
Figure 4 for Detection and Measurement of Syntactic Templates in Generated Text
Viaarxiv icon

Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG

Add code
Jun 18, 2024
Figure 1 for Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
Figure 2 for Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
Figure 3 for Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
Figure 4 for Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
Viaarxiv icon

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Add code
Jun 02, 2024
Figure 1 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Figure 2 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Figure 3 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Figure 4 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Viaarxiv icon

A Survey on Data Selection for Language Models

Add code
Mar 08, 2024
Viaarxiv icon

Calibrating Large Language Models with Sample Consistency

Add code
Feb 21, 2024
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

Paloma: A Benchmark for Evaluating Language Model Fit

Add code
Dec 16, 2023
Viaarxiv icon

Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals

Add code
Nov 16, 2023
Viaarxiv icon

What's In My Big Data?

Add code
Oct 31, 2023
Figure 1 for What's In My Big Data?
Figure 2 for What's In My Big Data?
Figure 3 for What's In My Big Data?
Figure 4 for What's In My Big Data?
Viaarxiv icon