Picture for Saad Mahamood

Saad Mahamood

Real-World Summarization: When Evaluation Reaches Its Limits

Add code
Jul 15, 2025
Viaarxiv icon

Sentence Embeddings as an intermediate target in end-to-end summarisation

Add code
May 06, 2025
Viaarxiv icon

Large Language Models as Span Annotators

Add code
Apr 11, 2025
Viaarxiv icon

Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

Add code
Aug 17, 2024
Figure 1 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Figure 2 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Figure 3 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Figure 4 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Viaarxiv icon

On the Role of Summary Content Units in Text Summarization Evaluation

Add code
Apr 02, 2024
Figure 1 for On the Role of Summary Content Units in Text Summarization Evaluation
Figure 2 for On the Role of Summary Content Units in Text Summarization Evaluation
Figure 3 for On the Role of Summary Content Units in Text Summarization Evaluation
Figure 4 for On the Role of Summary Content Units in Text Summarization Evaluation
Viaarxiv icon

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Add code
May 02, 2023
Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Viaarxiv icon

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

Add code
Dec 28, 2022
Figure 1 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 2 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 3 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 4 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Add code
Jun 24, 2022
Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Add code
Dec 06, 2021
Figure 1 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Figure 2 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Figure 3 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Figure 4 for NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Viaarxiv icon

Underreporting of errors in NLG output, and what to do about it

Add code
Aug 08, 2021
Figure 1 for Underreporting of errors in NLG output, and what to do about it
Figure 2 for Underreporting of errors in NLG output, and what to do about it
Viaarxiv icon