Picture for Isaac Caswell

Isaac Caswell

TranslateGemma Technical Report

Add code
Jan 15, 2026
Viaarxiv icon

WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects

Add code
Feb 18, 2025
Figure 1 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Figure 2 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Figure 3 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Figure 4 for WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Viaarxiv icon

SMOL: Professionally translated parallel data for 115 under-represented languages

Add code
Feb 17, 2025
Figure 1 for SMOL: Professionally translated parallel data for 115 under-represented languages
Figure 2 for SMOL: Professionally translated parallel data for 115 under-represented languages
Figure 3 for SMOL: Professionally translated parallel data for 115 under-represented languages
Figure 4 for SMOL: Professionally translated parallel data for 115 under-represented languages
Viaarxiv icon

Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text

Add code
Nov 11, 2023
Viaarxiv icon

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Add code
Sep 09, 2023
Viaarxiv icon

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Add code
May 24, 2023
Figure 1 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 2 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 3 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 4 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Viaarxiv icon

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

Add code
Mar 27, 2023
Viaarxiv icon

Building Machine Translation Systems for the Next Thousand Languages

Add code
May 16, 2022
Figure 1 for Building Machine Translation Systems for the Next Thousand Languages
Figure 2 for Building Machine Translation Systems for the Next Thousand Languages
Figure 3 for Building Machine Translation Systems for the Next Thousand Languages
Figure 4 for Building Machine Translation Systems for the Next Thousand Languages
Viaarxiv icon

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Add code
Jan 13, 2022
Figure 1 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 2 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 3 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 4 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Viaarxiv icon

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Add code
Mar 22, 2021
Figure 1 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 2 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 3 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 4 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Viaarxiv icon