Alert button
Picture for Isaac Caswell

Isaac Caswell

Alert button

Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text

Add code
Bookmark button
Alert button
Nov 11, 2023
Isaac Caswell, Lisa Wang, Isabel Papadimitriou

Viaarxiv icon

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Add code
Bookmark button
Alert button
Sep 09, 2023
Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat

Figure 1 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Figure 2 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Figure 3 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Figure 4 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Viaarxiv icon

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Add code
Bookmark button
Alert button
May 24, 2023
Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson, Dmitry Panteleev, Partha Talukdar

Figure 1 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 2 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 3 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 4 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Viaarxiv icon

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

Add code
Bookmark button
Alert button
Mar 27, 2023
Alex Jones, Isaac Caswell, Ishank Saxena, Orhan Firat

Figure 1 for Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Figure 2 for Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Figure 3 for Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Figure 4 for Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Viaarxiv icon

Building Machine Translation Systems for the Next Thousand Languages

Add code
Bookmark button
Alert button
May 16, 2022
Ankur Bapna, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, Macduff Hughes

Figure 1 for Building Machine Translation Systems for the Next Thousand Languages
Figure 2 for Building Machine Translation Systems for the Next Thousand Languages
Figure 3 for Building Machine Translation Systems for the Next Thousand Languages
Figure 4 for Building Machine Translation Systems for the Next Thousand Languages
Viaarxiv icon

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Add code
Bookmark button
Alert button
Jan 13, 2022
Aditya Siddhant, Ankur Bapna, Orhan Firat, Yuan Cao, Mia Xu Chen, Isaac Caswell, Xavier Garcia

Figure 1 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 2 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 3 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 4 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Viaarxiv icon

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Add code
Bookmark button
Alert button
Mar 22, 2021
Isaac Caswell, Julia Kreutzer, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Javier Ortiz Suárez, Iroro Orife, Kelechi Ogueji, Rubungo Andre Niyongabo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Ballı, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi

Figure 1 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 2 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 3 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 4 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Viaarxiv icon

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

Add code
Bookmark button
Alert button
Oct 29, 2020
Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna

Figure 1 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Figure 2 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Figure 3 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Figure 4 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Viaarxiv icon

BLEU might be Guilty but References are not Innocent

Add code
Bookmark button
Alert button
Apr 13, 2020
Markus Freitag, David Grangier, Isaac Caswell

Figure 1 for BLEU might be Guilty but References are not Innocent
Figure 2 for BLEU might be Guilty but References are not Innocent
Figure 3 for BLEU might be Guilty but References are not Innocent
Figure 4 for BLEU might be Guilty but References are not Innocent
Viaarxiv icon