Alert button
Picture for Badr AlKhamissi

Badr AlKhamissi

Alert button

Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models

Jun 28, 2023
Zaid Alyafeai, Maged S. Alshaibani, Badr AlKhamissi, Hamzah Luqman, Ebrahim Alareqi, Ali Fadel

Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3.5 and GPT-4. Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages. In this study, we assess the performance of GPT-3.5 and GPT-4 models on seven distinct Arabic NLP tasks: sentiment analysis, translation, transliteration, paraphrasing, part of speech tagging, summarization, and diacritization. Our findings reveal that GPT-4 outperforms GPT-3.5 on five out of the seven tasks. Furthermore, we conduct an extensive analysis of the sentiment analysis task, providing insights into how LLMs achieve exceptional results on a challenging dialectal dataset. Additionally, we introduce a new Python interface https://github.com/ARBML/Taqyim that facilitates the evaluation of these tasks effortlessly.

Viaarxiv icon

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

May 19, 2023
Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab

Figure 1 for OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models
Figure 2 for OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models
Figure 3 for OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models
Figure 4 for OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

In this paper, we conduct a thorough investigation into the reasoning capabilities of Large Language Models (LLMs), focusing specifically on the Open Pretrained Transformers (OPT) models as a representative of such models. Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations. We then evaluate all models on 57 out-of-domain tasks drawn from the SUPER-NATURALINSTRUCTIONS benchmark, covering 26 distinct reasoning skills, utilizing three prompting techniques. Through a comprehensive grid of 27 configurations and 6,156 test evaluations, we investigate the dimensions of finetuning, prompting, and scale to understand the role of explanations on different reasoning skills. Our findings reveal that having explanations in the fewshot exemplar has no significant impact on the model's performance when the model is finetuned, while positively affecting the non-finetuned counterpart. Moreover, we observe a slight yet consistent increase in classification accuracy as we incorporate explanations during prompting and finetuning, respectively. Finally, we offer insights on which skills benefit the most from incorporating explanations during finetuning and prompting, such as Numerical (+20.4%) and Analogical (+13.9%) reasoning, as well as skills that exhibit negligible or negative effects.

* Preprint 
Viaarxiv icon

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

Sep 30, 2022
Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab

Figure 1 for Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
Figure 2 for Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
Figure 3 for Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
Figure 4 for Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

Language Models pretrained on large textual data have been shown to encode different types of knowledge simultaneously. Traditionally, only the features from the last layer are used when adapting to new tasks or data. We put forward that, when using or finetuning deep pretrained models, intermediate layer features that may be relevant to the downstream task are buried too deep to be used efficiently in terms of needed samples or steps. To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers. We compare DWAtt to a basic concatenation-based layer fusion method (Concat), and compare both to a deeper model baseline -- all kept within a similar parameter budget. Our findings show that DWAtt and Concat are more step- and sample-efficient than the baseline, especially in the few-shot setting. DWAtt outperforms Concat on larger data sizes. On CoNLL-03 NER, layer fusion shows 3.68-9.73% F1 gain at different few-shot sizes. The layer fusion models presented significantly outperform the baseline in various training scenarios with different data sizes, architectures, and training constraints.

* 7 pages, 7 figures 
Viaarxiv icon

ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

May 25, 2022
Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab

Figure 1 for ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection
Figure 2 for ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection
Figure 3 for ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection
Figure 4 for ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next. It is also difficult to collect a large-scale hate speech annotated dataset. In this work, we frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts. In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic2020) improves the performance even further. Moreover, we observe that the trained models generalize to out-of-distribution datasets, showing the superiority of task decomposition and knowledge infusion compared to previously used methods. Concretely, our method outperforms the baseline by 17.83% absolute gain in the 16-shot case.

* Preprint 
Viaarxiv icon

Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification

May 16, 2022
Badr AlKhamissi, Mona Diab

Figure 1 for Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification
Figure 2 for Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification
Figure 3 for Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification
Figure 4 for Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification

In this paper, we tackle the Arabic Fine-Grained Hate Speech Detection shared task and demonstrate significant improvements over reported baselines for its three subtasks. The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories. Our final solution is an ensemble of models that employs multitask learning and a self-consistency correction method yielding 82.7% on the hate speech subtask -- reflecting a 3.4% relative improvement compared to previous work.

* Accepted at the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5/LREC 2022) 
Viaarxiv icon

A Review on Language Models as Knowledge Bases

Apr 12, 2022
Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad

Figure 1 for A Review on Language Models as Knowledge Bases
Figure 2 for A Review on Language Models as Knowledge Bases
Figure 3 for A Review on Language Models as Knowledge Bases

Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major advantage over traditional KBs in that this method requires no human supervision. In this paper, we present a set of aspects that we deem a LM should have to fully act as a KB, and review the recent literature with respect to those aspects.

* Preprint 
Viaarxiv icon

How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy

Dec 14, 2021
Badr AlKhamissi, Akshay Srinivasan, Zeb-Kurth Nelson, Sam Ritter

Figure 1 for How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy
Figure 2 for How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy
Figure 3 for How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy
Figure 4 for How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy

Alchemy is a new meta-learning environment rich enough to contain interesting abstractions, yet simple enough to make fine-grained analysis tractable. Further, Alchemy provides an optional symbolic interface that enables meta-RL research without a large compute budget. In this work, we take the first steps toward using Symbolic Alchemy to identify design choices that enable deep-RL agents to learn various types of abstraction. Then, using a variety of behavioral and introspective analyses we investigate how our trained agents use and represent abstract task variables, and find intriguing connections to the neuroscience of abstraction. We conclude by discussing the next steps for using meta-RL and Alchemy to better understand the representation of abstract variables in the brain.

* Preprint 
Viaarxiv icon

Deep Spiking Neural Networks with Resonate-and-Fire Neurons

Sep 16, 2021
Badr AlKhamissi, Muhammad ElNokrashy, David Bernal-Casas

Figure 1 for Deep Spiking Neural Networks with Resonate-and-Fire Neurons
Figure 2 for Deep Spiking Neural Networks with Resonate-and-Fire Neurons
Figure 3 for Deep Spiking Neural Networks with Resonate-and-Fire Neurons
Figure 4 for Deep Spiking Neural Networks with Resonate-and-Fire Neurons

In this work, we explore a new Spiking Neural Network (SNN) formulation with Resonate-and-Fire (RAF) neurons (Izhikevich, 2001) trained with gradient descent via back-propagation. The RAF-SNN, while more biologically plausible, achieves performance comparable to or higher than conventional models in the Machine Learning literature across different network configurations, using similar or fewer parameters. Strikingly, the RAF-SNN proves robust against noise induced at testing/training time, under both static and dynamic conditions. Against CNN on MNIST, we show 25% higher absolute accuracy with N(0, 0.2) induced noise at testing time. Against LSTM on N-MNIST, we show 70% higher absolute accuracy with 20% induced noise at training time.

* Preprint 
Viaarxiv icon

The Emergence of Abstract and Episodic Neurons in Episodic Meta-RL

Apr 07, 2021
Badr AlKhamissi, Muhammad ElNokrashy, Michael Spranger

Figure 1 for The Emergence of Abstract and Episodic Neurons in Episodic Meta-RL
Figure 2 for The Emergence of Abstract and Episodic Neurons in Episodic Meta-RL
Figure 3 for The Emergence of Abstract and Episodic Neurons in Episodic Meta-RL

In this work, we analyze the reinstatement mechanism introduced by Ritter et al. (2018) to reveal two classes of neurons that emerge in the agent's working memory (an epLSTM cell) when trained using episodic meta-RL on an episodic variant of the Harlow visual fixation task. Specifically, Abstract neurons encode knowledge shared across tasks, while Episodic neurons carry information relevant for a specific episode's task.

* This work was accepted at the Learning to Learn Workshop (ICLR 2021) 
Viaarxiv icon

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Mar 01, 2021
Badr AlKhamissi, Mohamed Gabr, Muhammad ElNokrashy, Khaled Essam

Figure 1 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task
Figure 2 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task
Figure 3 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task
Figure 4 for Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of MARBERT that achieves an F1-score of 34.03% for DA at the country-level development set -- an improvement of 7.63% from previous work.

* This work was accepted at the Sixth Arabic Natural Language Processing Workshop (EACL/WANLP 2021) 
Viaarxiv icon