Picture for Emiel van Miltenburg

Emiel van Miltenburg

Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations

Add code
Dec 21, 2023
Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Add code
May 02, 2023
Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

Evaluating NLG systems: A brief introduction

Add code
Mar 29, 2023
Viaarxiv icon

Implicit causality in GPT-2: a case study

Add code
Dec 08, 2022
Figure 1 for Implicit causality in GPT-2: a case study
Figure 2 for Implicit causality in GPT-2: a case study
Figure 3 for Implicit causality in GPT-2: a case study
Figure 4 for Implicit causality in GPT-2: a case study
Viaarxiv icon

Underreporting of errors in NLG output, and what to do about it

Add code
Aug 08, 2021
Figure 1 for Underreporting of errors in NLG output, and what to do about it
Figure 2 for Underreporting of errors in NLG output, and what to do about it
Viaarxiv icon

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Add code
Jun 16, 2021
Figure 1 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 2 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 3 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 4 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Viaarxiv icon

Preregistering NLP Research

Add code
Mar 23, 2021
Figure 1 for Preregistering NLP Research
Figure 2 for Preregistering NLP Research
Figure 3 for Preregistering NLP Research
Viaarxiv icon

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Add code
Feb 03, 2021
Figure 1 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 2 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 3 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 4 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Viaarxiv icon

On the use of human reference data for evaluating automatic image descriptions

Add code
Jun 15, 2020
Viaarxiv icon

Neural data-to-text generation: A comparison between pipeline and end-to-end architectures

Add code
Aug 23, 2019
Figure 1 for Neural data-to-text generation: A comparison between pipeline and end-to-end architectures
Figure 2 for Neural data-to-text generation: A comparison between pipeline and end-to-end architectures
Figure 3 for Neural data-to-text generation: A comparison between pipeline and end-to-end architectures
Figure 4 for Neural data-to-text generation: A comparison between pipeline and end-to-end architectures
Viaarxiv icon