Picture for Christoph Leiter

Christoph Leiter

CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks

Add code
May 16, 2025
Viaarxiv icon

DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?

Add code
Apr 10, 2025
Viaarxiv icon

ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

Add code
Dec 03, 2024
Figure 1 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
Figure 2 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
Figure 3 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
Figure 4 for ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
Viaarxiv icon

PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation

Add code
Jun 26, 2024
Figure 1 for PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Figure 2 for PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Figure 3 for PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Figure 4 for PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Viaarxiv icon

NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?

Add code
Dec 09, 2023
Figure 1 for NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Figure 2 for NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Figure 3 for NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Figure 4 for NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Viaarxiv icon

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics

Add code
Oct 30, 2023
Figure 1 for The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Figure 2 for The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Figure 3 for The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Figure 4 for The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Viaarxiv icon

NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?

Add code
Jul 31, 2023
Figure 1 for NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?
Figure 2 for NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?
Figure 3 for NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?
Figure 4 for NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?
Viaarxiv icon

Towards Explainable Evaluation Metrics for Machine Translation

Add code
Jun 22, 2023
Figure 1 for Towards Explainable Evaluation Metrics for Machine Translation
Figure 2 for Towards Explainable Evaluation Metrics for Machine Translation
Figure 3 for Towards Explainable Evaluation Metrics for Machine Translation
Figure 4 for Towards Explainable Evaluation Metrics for Machine Translation
Viaarxiv icon

ChatGPT: A Meta-Analysis after 2.5 Months

Add code
Feb 20, 2023
Figure 1 for ChatGPT: A Meta-Analysis after 2.5 Months
Figure 2 for ChatGPT: A Meta-Analysis after 2.5 Months
Figure 3 for ChatGPT: A Meta-Analysis after 2.5 Months
Figure 4 for ChatGPT: A Meta-Analysis after 2.5 Months
Viaarxiv icon

BMX: Boosting Machine Translation Metrics with Explainability

Add code
Dec 20, 2022
Figure 1 for BMX: Boosting Machine Translation Metrics with Explainability
Figure 2 for BMX: Boosting Machine Translation Metrics with Explainability
Figure 3 for BMX: Boosting Machine Translation Metrics with Explainability
Figure 4 for BMX: Boosting Machine Translation Metrics with Explainability
Viaarxiv icon