Picture for Vilém Zouhar

Vilém Zouhar

When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation

Add code
Feb 18, 2026
Viaarxiv icon

Pearmut: Human Evaluation of Translation Made Trivial

Add code
Jan 06, 2026
Viaarxiv icon

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

Add code
Dec 24, 2025
Figure 1 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 2 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 3 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 4 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Viaarxiv icon

Searching for Difficult-to-Translate Test Examples at Scale

Add code
Sep 30, 2025
Viaarxiv icon

Deconstructing Self-Bias in LLM-generated Translation Benchmarks

Add code
Sep 30, 2025
Viaarxiv icon

Generating Difficult-to-Translate Texts

Add code
Sep 30, 2025
Viaarxiv icon

Biased Tales: Cultural and Topic Bias in Generating Children's Stories

Add code
Sep 09, 2025
Figure 1 for Biased Tales: Cultural and Topic Bias in Generating Children's Stories
Figure 2 for Biased Tales: Cultural and Topic Bias in Generating Children's Stories
Figure 3 for Biased Tales: Cultural and Topic Bias in Generating Children's Stories
Figure 4 for Biased Tales: Cultural and Topic Bias in Generating Children's Stories
Viaarxiv icon

Estimating Machine Translation Difficulty

Add code
Aug 13, 2025
Figure 1 for Estimating Machine Translation Difficulty
Figure 2 for Estimating Machine Translation Difficulty
Figure 3 for Estimating Machine Translation Difficulty
Figure 4 for Estimating Machine Translation Difficulty
Viaarxiv icon

Can Large Language Models Capture Human Annotator Disagreements?

Add code
Jun 24, 2025
Viaarxiv icon

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Add code
May 29, 2025
Viaarxiv icon