One crucial aspect of sentiment analysis is negation handling, where the occurrence of negation can flip the sentiment of a sentence and negatively affects the machine learning-based sentiment classification. The role of negation in Arabic sentiment analysis has been explored only to a limited extent, especially for colloquial Arabic. In this paper, the author addresses the negation problem of machine learning-based sentiment classification for a colloquial Arabic language. To this end, we propose a simple rule-based algorithm for handling the problem; the rules were crafted based on observing many cases of negation. Additionally, simple linguistic knowledge and sentiment lexicon are used for this purpose. The author also examines the impact of the proposed algorithm on the performance of different machine learning algorithms. The results given by the proposed algorithm are compared with three baseline models. The experimental results show that there is a positive impact on the classifiers accuracy, precision and recall when the proposed algorithm is used compared to the baselines.
Unlike other languages, the Arabic language has a morphological complexity which makes the Arabic sentiment analysis is a challenging task. Moreover, the presence of the dialects in the Arabic texts have made the sentiment analysis task is more challenging, due to the absence of specific rules that govern the writing or speaking system. Generally, one of the problems of sentiment analysis is the high dimensionality of the feature vector. To resolve this problem, many feature selection methods have been proposed. In contrast to the dialectal Arabic language, these selection methods have been investigated widely for the English language. This work investigated the effect of feature selection methods and their combinations on dialectal Arabic sentiment classification. The feature selection methods are Information Gain (IG), Correlation, Support Vector Machine (SVM), Gini Index (GI), and Chi-Square. A number of experiments were carried out on dialectical Jordanian reviews with using an SVM classifier. Furthermore, the effect of different term weighting schemes, stemmers, stop words removal, and feature models on the performance were investigated. The experimental results showed that the best performance of the SVM classifier was obtained after the SVM and correlation feature selection methods had been combined with the uni-gram model.
Question processing is a fundamental step in a question answering (QA) application, and its quality impacts the performance of QA application. The major challenging issue in processing question is how to extract semantic of natural language questions (NLQs). A human language is ambiguous. Ambiguity may occur at two levels; lexical and syntactic. In this paper, we propose a new approach for resolving lexical ambiguity problem by integrating context knowledge and concepts knowledge of a domain, into shallow natural language processing (SNLP) techniques. Concepts knowledge is modeled using ontology, while context knowledge is obtained from WordNet, and it is determined based on neighborhood words in a question. The approach will be applied to a university QA system.
One of the main difficulties in sentiment analysis of the Arabic language is the presence of the colloquialism. In this paper, we examine the effect of using objective words in conjunction with sentimental words on sentiment classification for the colloquial Arabic reviews, specifically Jordanian colloquial reviews. The reviews often include both sentimental and objective words, however, the most existing sentiment analysis models ignore the objective words as they are considered useless. In this work, we created two lexicons: the first includes the colloquial sentimental words and compound phrases, while the other contains the objective words associated with values of sentiment tendency based on a particular estimation method. We used these lexicons to extract sentiment features that would be training input to the Support Vector Machines (SVM) to classify the sentiment polarity of the reviews. The reviews dataset have been collected manually from JEERAN website. The results of the experiments show that the proposed approach improves the polarity classification in comparison to two baseline models, with accuracy 95.6%.