Picture for Craig Thomson

Craig Thomson

The QCET Taxonomy of Standard Quality Criterion Names and Definitions for the Evaluation of NLP Systems

Add code
Sep 26, 2025
Viaarxiv icon

HEDS 3.0: The Human Evaluation Data Sheet Version 3.0

Add code
Dec 10, 2024
Viaarxiv icon

AI-based traffic analysis in digital twin networks

Add code
Nov 01, 2024
Viaarxiv icon

AI in Energy Digital Twining: A Reinforcement Learning-based Adaptive Digital Twin Model for Green Cities

Add code
Jan 28, 2024
Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Add code
May 02, 2023
Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Add code
Jun 24, 2022
Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

Generation Challenges: Results of the Accuracy Evaluation Shared Task

Add code
Aug 15, 2021
Figure 1 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Figure 2 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Figure 3 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Figure 4 for Generation Challenges: Results of the Accuracy Evaluation Shared Task
Viaarxiv icon

Underreporting of errors in NLG output, and what to do about it

Add code
Aug 08, 2021
Figure 1 for Underreporting of errors in NLG output, and what to do about it
Figure 2 for Underreporting of errors in NLG output, and what to do about it
Viaarxiv icon

A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

Add code
Nov 08, 2020
Figure 1 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Figure 2 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Figure 3 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Figure 4 for A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Viaarxiv icon

Shared Task on Evaluating Accuracy in Natural Language Generation

Add code
Jun 22, 2020
Viaarxiv icon