Alert button
Picture for Tejumade Afonja

Tejumade Afonja

Alert button

AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR

Sep 30, 2023
Tobi Olatunji, Tejumade Afonja, Aditya Yadavalli, Chris Chinenye Emezue, Sahib Singh, Bonaventure F. P. Dossou, Joanne Osuchukwu, Salomey Osei, Atnafu Lambebo Tonja, Naome Etori, Clinton Mbataku

Figure 1 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Figure 2 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Figure 3 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Figure 4 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR

Africa has a very low doctor-to-patient ratio. At very busy clinics, doctors could see 30+ patients per day -- a heavy patient burden compared with developed countries -- but productivity tools such as clinical automatic speech recognition (ASR) are lacking for these overworked clinicians. However, clinical ASR is mature, even ubiquitous, in developed nations, and clinician-reported performance of commercial clinical ASR systems is generally satisfactory. Furthermore, the recent performance of general domain ASR is approaching human accuracy. However, several gaps exist. Several publications have highlighted racial bias with speech-to-text algorithms and performance on minority accents lags significantly. To our knowledge, there is no publicly available research or benchmark on accented African clinical ASR, and speech data is non-existent for the majority of African accents. We release AfriSpeech, 200hrs of Pan-African English speech, 67,577 clips from 2,463 unique speakers across 120 indigenous accents from 13 countries for clinical and general domain ASR, a benchmark test set, with publicly available pre-trained models with SOTA performance on the AfriSpeech benchmark.

* Accepted to TACL 2023. This is a pre-MIT Press publication version 
Viaarxiv icon

MargCTGAN: A "Marginally'' Better CTGAN for the Low Sample Regime

Jul 16, 2023
Tejumade Afonja, Dingfan Chen, Mario Fritz

The potential of realistic and useful synthetic data is significant. However, current evaluation methods for synthetic tabular data generation predominantly focus on downstream task usefulness, often neglecting the importance of statistical properties. This oversight becomes particularly prominent in low sample scenarios, accompanied by a swift deterioration of these statistical measures. In this paper, we address this issue by conducting an evaluation of three state-of-the-art synthetic tabular data generators based on their marginal distribution, column-pair correlation, joint distribution and downstream task utility performance across high to low sample regimes. The popular CTGAN model shows strong utility, but underperforms in low sample settings in terms of utility. To overcome this limitation, we propose MargCTGAN that adds feature matching of de-correlated marginals, which results in a consistent improvement in downstream utility as well as statistical properties of the synthetic data.

* ICML 2023 Workshop on Deployable Generative AI 
Viaarxiv icon

AfriNames: Most ASR models "butcher" African Names

Jun 02, 2023
Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Chris Chinenye Emezue, Amina Mardiyyah Rufai, Sahib Singh

Figure 1 for AfriNames: Most ASR models "butcher" African Names
Figure 2 for AfriNames: Most ASR models "butcher" African Names
Figure 3 for AfriNames: Most ASR models "butcher" African Names
Figure 4 for AfriNames: Most ASR models "butcher" African Names

Useful conversational agents must accurately capture named entities to minimize error for downstream tasks, for example, asking a voice assistant to play a track from a certain artist, initiating navigation to a specific location, or documenting a laboratory result for a patient. However, where named entities such as ``Ukachukwu`` (Igbo), ``Lakicia`` (Swahili), or ``Ingabire`` (Rwandan) are spoken, automatic speech recognition (ASR) models' performance degrades significantly, propagating errors to downstream systems. We model this problem as a distribution shift and demonstrate that such model bias can be mitigated through multilingual pre-training, intelligent data augmentation strategies to increase the representation of African-named entities, and fine-tuning multilingual ASR models on multiple African accents. The resulting fine-tuned models show an 81.5\% relative WER improvement compared with the baseline on samples with African-named entities.

* Accepted at Interspeech 2023 (Main Conference) 
Viaarxiv icon

Proceedings of the NeurIPS 2021 Workshop on Machine Learning for the Developing World: Global Challenges

Jan 10, 2023
Paula Rodriguez Diaz, Tejumade Afonja, Konstantin Klemmer, Aya Salama, Niveditha Kalavakonda, Oluwafemi Azeez, Simone Fobi

These are the proceedings of the 5th workshop on Machine Learning for the Developing World (ML4D), held as part of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) on December 14th, 2021.

Viaarxiv icon

Generative Extraction of Audio Classifiers for Speaker Identification

Jul 26, 2022
Tejumade Afonja, Lucas Bourtoule, Varun Chandrasekaran, Sageev Oore, Nicolas Papernot

Figure 1 for Generative Extraction of Audio Classifiers for Speaker Identification
Figure 2 for Generative Extraction of Audio Classifiers for Speaker Identification
Figure 3 for Generative Extraction of Audio Classifiers for Speaker Identification
Figure 4 for Generative Extraction of Audio Classifiers for Speaker Identification

It is perhaps no longer surprising that machine learning models, especially deep neural networks, are particularly vulnerable to attacks. One such vulnerability that has been well studied is model extraction: a phenomenon in which the attacker attempts to steal a victim's model by training a surrogate model to mimic the decision boundaries of the victim model. Previous works have demonstrated the effectiveness of such an attack and its devastating consequences, but much of this work has been done primarily for image and text processing tasks. Our work is the first attempt to perform model extraction on {\em audio classification models}. We are motivated by an attacker whose goal is to mimic the behavior of the victim's model trained to identify a speaker. This is particularly problematic in security-sensitive domains such as biometric authentication. We find that prior model extraction techniques, where the attacker \textit{naively} uses a proxy dataset to attack a potential victim's model, fail. We therefore propose the use of a generative model to create a sufficiently large and diverse pool of synthetic attack queries. We find that our approach is able to extract a victim's model trained on \texttt{LibriSpeech} using queries synthesized with a proxy dataset based off of \texttt{VoxCeleb}; we achieve a test accuracy of 84.41\% with a budget of 3 million queries.

Viaarxiv icon

Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus

Dec 12, 2021
Tejumade Afonja, Oladimeji Mudele, Iroro Orife, Kenechi Dukor, Lawrence Francis, Duru Goodness, Oluwafemi Azeez, Ademola Malomo, Clinton Mbataku

Figure 1 for Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus
Figure 2 for Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus
Figure 3 for Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus
Figure 4 for Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus

This paper describes foundational efforts with SautiDB-Naija, a novel corpus of non-native (L2) Nigerian English speech. We describe how the corpus was created and curated as well as preliminary experiments with accent classification and learning Nigerian accent embeddings. The initial version of the corpus includes over 900 recordings from L2 English speakers of Nigerian languages, such as Yoruba, Igbo, Edo, Efik-Ibibio, and Igala. We further demonstrate how fine-tuning on a pre-trained model like wav2vec can yield representations suitable for related speech tasks such as accent classification. SautiDB-Naija has been published to Zenodo for general use under a flexible Creative Commons License.

Viaarxiv icon

Proceedings of the NeurIPS 2020 Workshop on Machine Learning for the Developing World: Improving Resilience

Jan 12, 2021
Tejumade Afonja, Konstantin Klemmer, Aya Salama, Paula Rodriguez Diaz, Niveditha Kalavakonda, Oluwafemi Azeez

These are the proceedings of the 4th workshop on Machine Learning for the Developing World (ML4D), held as part of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS) on Saturday, December 12th 2020.

Viaarxiv icon