Picture for Simone Balloccu

Simone Balloccu

The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

Add code
Mar 30, 2026
Viaarxiv icon

Large Language Models as Span Annotators

Add code
Apr 11, 2025
Viaarxiv icon

Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

Add code
Aug 17, 2024
Figure 1 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Figure 2 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Figure 3 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Figure 4 for Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Viaarxiv icon

factgenie: A Framework for Span-based Evaluation of Generated Texts

Add code
Jul 25, 2024
Figure 1 for factgenie: A Framework for Span-based Evaluation of Generated Texts
Figure 2 for factgenie: A Framework for Span-based Evaluation of Generated Texts
Viaarxiv icon

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Add code
Feb 06, 2024
Figure 1 for Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Figure 2 for Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Figure 3 for Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Figure 4 for Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Viaarxiv icon

Ask the experts: sourcing high-quality datasets for nutritional counselling through Human-AI collaboration

Add code
Jan 16, 2024
Viaarxiv icon

Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain

Add code
Jul 02, 2022
Figure 1 for Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain
Figure 2 for Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain
Figure 3 for Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain
Figure 4 for Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain
Viaarxiv icon

How are you? Introducing stress-based text tailoring

Add code
Jul 20, 2020
Figure 1 for How are you? Introducing stress-based text tailoring
Figure 2 for How are you? Introducing stress-based text tailoring
Viaarxiv icon