Alert button
Picture for Craig Thomson

Craig Thomson

Alert button

AI in Energy Digital Twining: A Reinforcement Learning-based Adaptive Digital Twin Model for Green Cities

Jan 28, 2024
Lal Verda Cakir, Kubra Duran, Craig Thomson, Matthew Broadbent, Berk Canberk

Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

May 02, 2023
Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Jackie Cheung, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Huiyuan Lai, Chris van der Lee, Emiel van Miltenburg, Yiru Li, Saad Mahamood, Margot Mieskes, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Pablo Mosteiro Romero, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang

Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Jun 24, 2022
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

Generation Challenges: Results of the Accuracy Evaluation Shared Task

Aug 15, 2021
Craig Thomson, Ehud Reiter

Figure 1 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Figure 2 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Figure 3 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Figure 4 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Viaarxiv icon

Underreporting of errors in NLG output, and what to do about it

Aug 08, 2021
Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson, Luou Wen

Figure 1 for Underreporting of errors in NLG output, and what to do about it
Figure 2 for Underreporting of errors in NLG output, and what to do about it
Viaarxiv icon

A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

Nov 08, 2020
Craig Thomson, Ehud Reiter

Figure 1 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Figure 2 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Figure 3 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Figure 4 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Viaarxiv icon

Shared Task on Evaluating Accuracy in Natural Language Generation

Jun 22, 2020
Ehud Reiter, Craig Thomson

Viaarxiv icon