Picture for Markus Freitag

Markus Freitag

TranslateGemma Technical Report

Add code
Jan 15, 2026
Viaarxiv icon

Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results

Add code
Nov 07, 2025
Figure 1 for Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results
Figure 2 for Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results
Figure 3 for Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results
Figure 4 for Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results
Viaarxiv icon

Searching for Difficult-to-Translate Test Examples at Scale

Add code
Sep 30, 2025
Viaarxiv icon

Generating Difficult-to-Translate Texts

Add code
Sep 30, 2025
Viaarxiv icon

Deconstructing Self-Bias in LLM-generated Translation Benchmarks

Add code
Sep 30, 2025
Viaarxiv icon

You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation

Add code
Apr 01, 2025
Viaarxiv icon

Enhancing Human Evaluation in Machine Translation with Comparative Judgment

Add code
Feb 25, 2025
Figure 1 for Enhancing Human Evaluation in Machine Translation with Comparative Judgment
Figure 2 for Enhancing Human Evaluation in Machine Translation with Comparative Judgment
Figure 3 for Enhancing Human Evaluation in Machine Translation with Comparative Judgment
Figure 4 for Enhancing Human Evaluation in Machine Translation with Comparative Judgment
Viaarxiv icon

WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects

Add code
Feb 18, 2025
Figure 1 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Figure 2 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Figure 3 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Figure 4 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Viaarxiv icon

Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation

Add code
Jan 30, 2025
Figure 1 for Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
Figure 2 for Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
Figure 3 for Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
Figure 4 for Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
Viaarxiv icon

From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set

Add code
Nov 23, 2024
Figure 1 for From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
Figure 2 for From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
Figure 3 for From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
Figure 4 for From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
Viaarxiv icon