Picture for Maraim Masoud

Maraim Masoud

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Figure 1 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 2 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 3 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 4 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets

Add code
Aug 01, 2022
Figure 1 for Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
Figure 2 for Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
Figure 3 for Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
Figure 4 for Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
Viaarxiv icon

Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources

Add code
Jan 25, 2022
Figure 1 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Figure 2 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Figure 3 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Figure 4 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Viaarxiv icon

Masader: Metadata Sourcing for Arabic Text and Speech Data Resources

Add code
Oct 13, 2021
Figure 1 for Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Figure 2 for Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Figure 3 for Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Figure 4 for Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Viaarxiv icon

Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios

Add code
Sep 28, 2020
Figure 1 for Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios
Figure 2 for Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios
Figure 3 for Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios
Figure 4 for Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios
Viaarxiv icon