Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Emile Niyomutabazi

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

May 23, 2023

Cheikh M. Bamba Dione, David Adelani, Peter Nabende, Jesujoba Alabi, Thapelo Sindane, Happy Buzaaba, Shamsuddeen Hassan Muhammad, Chris Chinenye Emezue, Perez Ogayo, Anuoluwapo Aremu(+34 more)

Figure 1 for MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

Figure 2 for MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

Figure 3 for MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

Figure 4 for MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

Abstract:In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.

* Accepted to ACL 2023 (Main conference)

Via

Access Paper or Ask Questions

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

May 11, 2023

Odunayo Ogundepo, Tajuddeen R. Gwadabe, Clara E. Rivera, Jonathan H. Clark, Sebastian Ruder, David Ifeoluwa Adelani, Bonaventure F. P. Dossou, Abdou Aziz DIOP, Claytone Sikasote, Gilles Hacheme(+42 more)

Figure 1 for AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Figure 2 for AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Figure 3 for AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Figure 4 for AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Abstract:African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

Via

Access Paper or Ask Questions