Picture for Elizabeth Clark

Elizabeth Clark

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

Add code
May 22, 2023
Figure 1 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Figure 2 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Figure 3 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Figure 4 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

May 02, 2023
Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

Add code
Dec 28, 2022
Figure 1 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 2 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 3 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 4 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Viaarxiv icon

mFACE: Multilingual Summarization with Factual Consistency Evaluation

Dec 20, 2022
Figure 1 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 2 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 3 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 4 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Viaarxiv icon

Dialect-robust Evaluation of Generated Text

Add code
Nov 02, 2022
Figure 1 for Dialect-robust Evaluation of Generated Text
Figure 2 for Dialect-robust Evaluation of Generated Text
Figure 3 for Dialect-robust Evaluation of Generated Text
Figure 4 for Dialect-robust Evaluation of Generated Text
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Add code
Jun 24, 2022
Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Add code
Feb 14, 2022
Figure 1 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Figure 2 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Figure 3 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Figure 4 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Viaarxiv icon

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

Add code
Jul 07, 2021
Figure 1 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 2 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 3 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 4 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Viaarxiv icon

Evaluation of Text Generation: A Survey

Add code
Jun 26, 2020
Figure 1 for Evaluation of Text Generation: A Survey
Figure 2 for Evaluation of Text Generation: A Survey
Figure 3 for Evaluation of Text Generation: A Survey
Figure 4 for Evaluation of Text Generation: A Survey
Viaarxiv icon

Evaluating Machines by their Real-World Language Use

Add code
Apr 07, 2020
Figure 1 for Evaluating Machines by their Real-World Language Use
Figure 2 for Evaluating Machines by their Real-World Language Use
Figure 3 for Evaluating Machines by their Real-World Language Use
Figure 4 for Evaluating Machines by their Real-World Language Use
Viaarxiv icon