Alert button
Picture for Emiel van Miltenburg

Emiel van Miltenburg

Alert button

Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations

Dec 21, 2023
Anouck Braggaar, Christine Liebrecht, Emiel van Miltenburg, Emiel Krahmer

Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

May 02, 2023
Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Jackie Cheung, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Huiyuan Lai, Chris van der Lee, Emiel van Miltenburg, Yiru Li, Saad Mahamood, Margot Mieskes, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Pablo Mosteiro Romero, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang

Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

Evaluating NLG systems: A brief introduction

Mar 29, 2023
Emiel van Miltenburg

Viaarxiv icon

Implicit causality in GPT-2: a case study

Dec 08, 2022
Hien Huynh, Tomas O. Lentz, Emiel van Miltenburg

Figure 1 for Implicit causality in GPT-2: a case study
Figure 2 for Implicit causality in GPT-2: a case study
Figure 3 for Implicit causality in GPT-2: a case study
Figure 4 for Implicit causality in GPT-2: a case study
Viaarxiv icon

Underreporting of errors in NLG output, and what to do about it

Aug 08, 2021
Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson, Luou Wen

Figure 1 for Underreporting of errors in NLG output, and what to do about it
Figure 2 for Underreporting of errors in NLG output, and what to do about it
Viaarxiv icon

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Jun 16, 2021
Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann

Figure 1 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 2 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 3 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Figure 4 for Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Viaarxiv icon

Preregistering NLP Research

Mar 23, 2021
Emiel van Miltenburg, Chris van der Lee, Emiel Krahmer

Figure 1 for Preregistering NLP Research
Figure 2 for Preregistering NLP Research
Figure 3 for Preregistering NLP Research
Viaarxiv icon

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Feb 03, 2021
Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou

Figure 1 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 2 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 3 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 4 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Viaarxiv icon