Alert button
Picture for Elizabeth Clark

Elizabeth Clark

Alert button

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

Add code
Bookmark button
Alert button
May 22, 2023
Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez, Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan Das, Ankur P. Parikh

Figure 1 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Figure 2 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Figure 3 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Figure 4 for SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Add code
Bookmark button
Alert button
May 02, 2023
Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Jackie Cheung, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Huiyuan Lai, Chris van der Lee, Emiel van Miltenburg, Yiru Li, Saad Mahamood, Margot Mieskes, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Pablo Mosteiro Romero, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang

Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

Add code
Bookmark button
Alert button
Dec 28, 2022
Lining Zhang, João Sedoc, Simon Mille, Yufang Hou, Sebastian Gehrmann, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Miruna Clinciu, Saad Mahamood, Khyathi Chandu

Figure 1 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 2 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 3 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 4 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Viaarxiv icon

mFACE: Multilingual Summarization with Factual Consistency Evaluation

Add code
Bookmark button
Alert button
Dec 20, 2022
Roee Aharoni, Shashi Narayan, Joshua Maynez, Jonathan Herzig, Elizabeth Clark, Mirella Lapata

Figure 1 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 2 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 3 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 4 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Viaarxiv icon

Dialect-robust Evaluation of Generated Text

Add code
Bookmark button
Alert button
Nov 02, 2022
Jiao Sun, Thibault Sellam, Elizabeth Clark, Tu Vu, Timothy Dozat, Dan Garrette, Aditya Siddhant, Jacob Eisenstein, Sebastian Gehrmann

Figure 1 for Dialect-robust Evaluation of Generated Text
Figure 2 for Dialect-robust Evaluation of Generated Text
Figure 3 for Dialect-robust Evaluation of Generated Text
Figure 4 for Dialect-robust Evaluation of Generated Text
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Add code
Bookmark button
Alert button
Jun 24, 2022
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Add code
Bookmark button
Alert button
Feb 14, 2022
Sebastian Gehrmann, Elizabeth Clark, Thibault Sellam

Figure 1 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Figure 2 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Figure 3 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Figure 4 for Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Viaarxiv icon

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

Add code
Bookmark button
Alert button
Jul 07, 2021
Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

Figure 1 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 2 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 3 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Figure 4 for All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Viaarxiv icon