Alert button
Picture for Abdul Waheed

Abdul Waheed

Alert button

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

Oct 27, 2023
Abdul Waheed, Bashar Talafha, Peter Sullivan, AbdelRahim Elmadany, Muhammad Abdul-Mageed

Figure 1 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Figure 2 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Figure 3 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Figure 4 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

Arabic is a complex language with many varieties and dialects spoken by over 450 millions all around the world. Due to the linguistic diversity and variations, it is challenging to build a robust and generalized ASR system for Arabic. In this work, we address this gap by developing and demoing a system, dubbed VoxArabica, for dialect identification (DID) as well as automatic speech recognition (ASR) of Arabic. We train a wide range of models such as HuBERT (DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASR tasks. Our DID models are trained to identify 17 different dialects in addition to MSA. We finetune our ASR models on MSA, Egyptian, Moroccan, and mixed data. Additionally, for the remaining dialects in ASR, we provide the option to choose various models such as Whisper and MMS in a zero-shot setting. We integrate these models into a single web interface with diverse features such as audio recording, file upload, model selection, and the option to raise flags for incorrect outputs. Overall, we believe VoxArabica will be useful for a wide range of audiences concerned with Arabic research. Our system is currently running at https://cdce-206-12-100-168.ngrok.io/.

* Accepted at ArabicNLP conference co-located with EMNLP'23. First three authors contributed equally 
Viaarxiv icon

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

Aug 06, 2023
Karima Kadaoui, Samar M. Magdy, Abdul Waheed, Md Tawkat Islam Khondaker, Ahmed Oumar El-Shangiti, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Figure 1 for TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties
Figure 2 for TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties
Figure 3 for TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties
Figure 4 for TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

Large language models (LLMs) finetuned to follow human instructions have recently emerged as a breakthrough in AI. Models such as Google Bard and OpenAI ChatGPT, for example, are surprisingly powerful tools for question answering, code debugging, and dialogue generation. Despite the purported multilingual proficiency of these models, their linguistic inclusivity remains insufficiently explored. Considering this constraint, we present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5 and GPT-4) regarding their machine translation proficiencies across ten varieties of Arabic. Our evaluation covers diverse Arabic varieties such as Classical Arabic, Modern Standard Arabic, and several nuanced dialectal variants. Furthermore, we undertake a human-centric study to scrutinize the efficacy of the most recent model, Bard, in following human instructions during translation tasks. Our exhaustive analysis indicates that LLMs may encounter challenges with certain Arabic dialects, particularly those for which minimal public data exists, such as Algerian and Mauritanian dialects. However, they exhibit satisfactory performance with more prevalent dialects, albeit occasionally trailing behind established commercial systems like Google Translate. Additionally, our analysis reveals a circumscribed capability of Bard in aligning with human instructions in translation contexts. Collectively, our findings underscore that prevailing LLMs remain far from inclusive, with only limited ability to cater for the linguistic and cultural intricacies of diverse communities.

Viaarxiv icon

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Jun 05, 2023
Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Figure 1 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Figure 2 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Figure 3 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings. However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task. Our evaluation covers most publicly available Arabic speech data and is performed under n-shot (zero-, few-, and full) finetuning. We also investigate the robustness of Whisper under completely novel conditions, such as in dialect-accented standard Arabic and in unseen dialects for which we develop evaluation data. Our experiments show that although Whisper zero-shot outperforms fully finetuned XLS-R models on all datasets, its performance deteriorates significantly in the zero-shot setting for five unseen dialects (i.e., Algeria, Jordan, Palestine, UAE, and Yemen).

* 4 pages, INTERSPEECH 2023 
Viaarxiv icon

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

May 24, 2023
Md Tawkat Islam Khondaker, Abdul Waheed, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Figure 1 for GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP
Figure 2 for GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP
Figure 3 for GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP
Figure 4 for GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

The recent emergence of ChatGPT has brought a revolutionary change in the landscape of NLP. Although ChatGPT has consistently shown impressive performance on English benchmarks, its exact capabilities on most other languages remain largely unknown. To better understand ChatGPT's capabilities on Arabic, we present a large-scale evaluation of the model on a broad range of Arabic NLP tasks. Namely, we evaluate ChatGPT on 32 diverse natural language understanding and generation tasks on over 60 different datasets. To the best of our knowledge, our work offers the first performance analysis of ChatGPT on Arabic NLP at such a massive scale. Our results show that, despite its success on English benchmarks, ChatGPT trained in-context (few-shot) is consistently outperformed by much smaller dedicated models finetuned on Arabic. These results suggest that there is significant place for improvement for instruction-tuned LLMs such as ChatGPT.

* Work in progress 
Viaarxiv icon

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Apr 27, 2023
Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji

Figure 1 for LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Figure 2 for LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Figure 3 for LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Figure 4 for LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Large language models (LLMs) with instruction finetuning demonstrate superior generative capabilities. However, these models are resource intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs to much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizeable, we design our instructions to cover a broad set of topics to ensure. A thorough investigation of our instruction data demonstrate their diversity, and we generate responses for these instructions using gpt-3.5-turbo. We then exploit the instructions to tune a host of models, dubbed LaMini-LM, of varying sizes, both from the encoder-decoder as well as the decoder-only families. We evaluate our models both automatically (on 15 different NLP benchmarks) and manually. Results show that our proposed LaMini-LM are on par with competitive baselines while being nearly 10 times smaller in size.

* Work in progress, 20 pages, 8 figures, 13 tables 
Viaarxiv icon

Speaker and Time-aware Joint Contextual Learning for Dialogue-act Classification in Counselling Conversations

Nov 12, 2021
Ganeshan Malhotra, Abdul Waheed, Aseem Srivastava, Md Shad Akhtar, Tanmoy Chakraborty

Figure 1 for Speaker and Time-aware Joint Contextual Learning for Dialogue-act Classification in Counselling Conversations
Figure 2 for Speaker and Time-aware Joint Contextual Learning for Dialogue-act Classification in Counselling Conversations
Figure 3 for Speaker and Time-aware Joint Contextual Learning for Dialogue-act Classification in Counselling Conversations
Figure 4 for Speaker and Time-aware Joint Contextual Learning for Dialogue-act Classification in Counselling Conversations

The onset of the COVID-19 pandemic has brought the mental health of people under risk. Social counselling has gained remarkable significance in this environment. Unlike general goal-oriented dialogues, a conversation between a patient and a therapist is considerably implicit, though the objective of the conversation is quite apparent. In such a case, understanding the intent of the patient is imperative in providing effective counselling in therapy sessions, and the same applies to a dialogue system as well. In this work, we take forward a small but an important step in the development of an automated dialogue system for mental-health counselling. We develop a novel dataset, named HOPE, to provide a platform for the dialogue-act classification in counselling conversations. We identify the requirement of such conversation and propose twelve domain-specific dialogue-act (DAC) labels. We collect 12.9K utterances from publicly-available counselling session videos on YouTube, extract their transcripts, clean, and annotate them with DAC labels. Further, we propose SPARTA, a transformer-based architecture with a novel speaker- and time-aware contextual learning for the dialogue-act classification. Our evaluation shows convincing performance over several baselines, achieving state-of-the-art on HOPE. We also supplement our experiments with extensive empirical and qualitative analyses of SPARTA.

* 9 pages; Accepted to WSDM 2022 
Viaarxiv icon

BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification

Aug 16, 2021
Abdul Waheed, Muskan Goyal, Nimisha Mittal, Deepak Gupta, Ashish Khanna, Moolchand Sharma

Figure 1 for BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification
Figure 2 for BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification
Figure 3 for BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification
Figure 4 for BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification

Bloom taxonomy is a common paradigm for categorizing educational learning objectives into three learning levels: cognitive, affective, and psychomotor. For the optimization of educational programs, it is crucial to design course learning outcomes (CLOs) according to the different cognitive levels of Bloom Taxonomy. Usually, administrators of the institutions manually complete the tedious work of mapping CLOs and examination questions to Bloom taxonomy levels. To address this issue, we propose a transformer-based model named BloomNet that captures linguistic as well semantic information to classify the course learning outcomes (CLOs). We compare BloomNet with a diverse set of basic as well as strong baselines and we observe that our model performs better than all the experimented baselines. Further, we also test the generalization capability of BloomNet by evaluating it on different distributions which our model does not encounter during training and we observe that our model is less susceptible to distribution shift compared to the other considered models. We support our findings by performing extensive result analysis. In ablation study we observe that on explicitly encapsulating the linguistic information along with semantic information improves the model on IID (independent and identically distributed) performance as well as OOD (out-of-distribution) generalization capability.

* Bloom's Taxonomy, Natural Language Processing, Transformer, Robustness and Generalization 
Viaarxiv icon

CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection

Mar 08, 2021
Abdul Waheed, Muskan Goyal, Deepak Gupta, Ashish Khanna, Fadi Al-Turjman, Placido Rogerio Pinheiro

Figure 1 for CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection
Figure 2 for CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection
Figure 3 for CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection
Figure 4 for CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection

Coronavirus (COVID-19) is a viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The spread of COVID-19 seems to have a detrimental effect on the global economy and health. A positive chest X-ray of infected patients is a crucial step in the battle against COVID-19. Early results suggest that abnormalities exist in chest X-rays of patients suggestive of COVID-19. This has led to the introduction of a variety of deep learning systems and studies have shown that the accuracy of COVID-19 patient detection through the use of chest X-rays is strongly optimistic. Deep learning networks like convolutional neural networks (CNNs) need a substantial amount of training data. Because the outbreak is recent, it is difficult to gather a significant number of radiographic images in such a short time. Therefore, in this research, we present a method to generate synthetic chest X-ray (CXR) images by developing an Auxiliary Classifier Generative Adversarial Network (ACGAN) based model called CovidGAN. In addition, we demonstrate that the synthetic images produced from CovidGAN can be utilized to enhance the performance of CNN for COVID-19 detection. Classification using CNN alone yielded 85% accuracy. By adding synthetic images produced by CovidGAN, the accuracy increased to 95%. We hope this method will speed up COVID-19 detection and lead to more robust systems of radiology.

* IEEE Access, vol. 8, pp. 91916-91923, 2020  
* Accepted at IEEE Access. Received April 30, 2020, accepted May 11, 2020, date of publication May 14, 2020, date of current version May 28, 2020 
Viaarxiv icon