The complexities of Arabic language in morphology, orthography and dialects makes sentiment analysis for Arabic more challenging. Also, text feature extraction from short messages like tweets, in order to gauge the sentiment, makes this task even more difficult. In recent years, deep neural networks were often employed and showed very good results in sentiment classification and natural language processing applications. Word embedding, or word distributing approach, is a current and powerful tool to capture together the closest words from a contextual text. In this paper, we describe how we construct Word2Vec models from a large Arabic corpus obtained from ten newspapers in different Arab countries. By applying different machine learning algorithms and convolutional neural networks with different text feature selections, we report improved accuracy of sentiment classification (91%-95%) on our publicly available Arabic language health sentiment dataset 
Whenever human beings interact with each other, they exchange or express opinions, emotions, and sentiments. These opinions can be expressed in text, speech or images. Analysis of these sentiments is one of the popular research areas of present day researchers. Sentiment analysis, also known as opinion mining tries to identify or classify these sentiments or opinions into two broad categories - positive and negative. In recent years, the scientific community has taken a lot of interest in analyzing sentiment in textual data available in various social media platforms. Much work has been done on social media conversations, blog posts, newspaper articles and various narrative texts. However, when it comes to identifying emotions from scientific papers, researchers have faced some difficulties due to the implicit and hidden nature of opinion. By default, citation instances are considered inherently positive in emotion. Popular ranking and indexing paradigms often neglect the opinion present while citing. In this paper, we have tried to achieve three objectives. First, we try to identify the major sentiment in the citation text and assign a score to the instance. We have used a statistical classifier for this purpose. Secondly, we have proposed a new index (we shall refer to it hereafter as M-index) which takes into account both the quantitative and qualitative factors while scoring a paper. Thirdly, we developed a ranking of research papers based on the M-index. We also try to explain how the M-index impacts the ranking of scientific papers.
Imbalanced data commonly exists in real world, espacially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like "but", "though", "while", etc, and the head discourse and the tail discourse 3 usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pretrained attention-based model. Our method increases sample diversity in the first place, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.
Metrics for measuring the citation sentiment changes were introduced. Citation sentiment changes can be observed from global citation sentiment sequences (GCSSs). With respect to a cited paper, the citation sentiment sequences were analysed across a collection of citing papers ordered by the published time. For analysing GCSSs, Eddy Dissipation Rate (EDR) was adopted, with the hypothesis that the GCSSs pattern differences can be spotted by EDR based method. Preliminary evidence showed that EDR based method holds the potential for analysing a publication's impact in a time series fashion.
We propose a topic-dependent attention model for sentiment classification and topic extraction. Our model assumes that a global topic embedding is shared across documents and employs an attention mechanism to derive local topic embedding for words and sentences. These are subsequently incorporated in a modified Gated Recurrent Unit (GRU) for sentiment classification and extraction of topics bearing different sentiment polarities. Those topics emerge from the words' local topic embeddings learned by the internal attention of the GRU cells in the context of a multi-task learning framework. In this paper, we present the hierarchical architecture, the new GRU unit and the experiments conducted on users' reviews which demonstrate classification performance on a par with the state-of-the-art methodologies for sentiment classification and topic coherence outperforming the current approaches for supervised topic extraction. In addition, our model is able to extract coherent aspect-sentiment clusters despite using no aspect-level annotations for training.
Emojis, as a new way of conveying nonverbal cues, are widely adopted in computer-mediated communications. In this paper, first from a message sender perspective, we focus on people's motives in using four types of emojis -- positive, neutral, negative, and non-facial. We compare the willingness levels of using these emoji types for seven typical intentions that people usually apply nonverbal cues for in communication. The results of extensive statistical hypothesis tests not only report the popularities of the intentions, but also uncover the subtle differences between emoji types in terms of intended uses. Second, from a perspective of message recipients, we further study the sentiment effects of emojis, as well as their duplications, on verbal messages. Different from previous studies in emoji sentiment, we study the sentiments of emojis and their contexts as a whole. The experiment results indicate that the powers of conveying sentiment are different between four emoji types, and the sentiment effects of emojis vary in the contexts of different valences.
In order to maximize the applicability of sentiment analysis results, it is necessary to not only classify the overall sentiment (positive/negative) of a given document but also to identify the main words that contribute to the classification. However, most datasets for sentiment analysis only have the sentiment label for each document or sentence. In other words, there is no information about which words play an important role in sentiment classification. In this paper, we propose a method for identifying key words discriminating positive and negative sentences by using a weakly supervised learning method based on a convolutional neural network (CNN). In our model, each word is represented as a continuous-valued vector and each sentence is represented as a matrix whose rows correspond to the word vector used in the sentence. Then, the CNN model is trained using these sentence matrices as inputs and the sentiment labels as the output. Once the CNN model is trained, we implement the word attention mechanism that identifies high-contributing words to classification results with a class activation map, using the weights from the fully connected layer at the end of the learned CNN model. In order to verify the proposed methodology, we evaluated the classification accuracy and inclusion rate of polarity words using two movie review datasets. Experimental result show that the proposed model can not only correctly classify the sentence polarity but also successfully identify the corresponding words with high polarity scores.
While existing machine learning models have achieved great success for sentiment classification, they typically do not explicitly capture sentiment-oriented word interaction, which can lead to poor results for fine-grained analysis at the snippet level (a phrase or sentence). Factorization Machine provides a possible approach to learning element-wise interaction for recommender systems, but they are not directly applicable to our task due to the inability to model contexts and word sequences. In this work, we develop two Position-aware Factorization Machines which consider word interaction, context and position information. Such information is jointly encoded in a set of sentiment-oriented word interaction vectors. Compared to traditional word embeddings, SWI vectors explicitly capture sentiment-oriented word interaction and simplify the parameter learning. Experimental results show that while they have comparable performance with state-of-the-art methods for document-level classification, they benefit the snippet/sentence-level sentiment analysis.
Stock market forecasting is very important in the planning of business activities. Stock price prediction has attracted many researchers in multiple disciplines including computer science, statistics, economics, finance, and operations research. Recent studies have shown that the vast amount of online information in the public domain such as Wikipedia usage pattern, news stories from the mainstream media, and social media discussions can have an observable effect on investors opinions towards financial markets. The reliability of the computational models on stock market prediction is important as it is very sensitive to the economy and can directly lead to financial loss. In this paper, we retrieved, extracted, and analyzed the effects of news sentiments on the stock market. Our main contributions include the development of a sentiment analysis dictionary for the financial sector, the development of a dictionary-based sentiment analysis model, and the evaluation of the model for gauging the effects of news sentiments on stocks for the pharmaceutical market. Using only news sentiments, we achieved a directional accuracy of 70.59% in predicting the trends in short-term stock price movement.
The COVID-19 pandemic has affected societies and human health and well-being in various ways. In this study, we collected Reddit data from 2019 (pre-pandemic) and 2020 (pandemic) from the subreddits communities associated with 8 universities, applied natural language processing (NLP) techniques, and trained graphical neural networks with social media data, to study how the pandemic has affected people's emotions and psychological states compared to the pre-pandemic era. Specifically, we first applied a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) to learn embedding from the semantic information of Reddit messages and trained a graph attention network (GAT) for sentiment classification. The usage of GAT allows us to leverage the relational information among the messages during training. We then applied subgroup-adaptive model stacking to combine the prediction probabilities from RoBERTa and GAT to yield the final classification on sentiment. With the manually labeled and model-predicted sentiment labels on the collected data, we applied a generalized linear mixed-effects model to estimate the effects of pandemic and online teaching on people's sentiment in a statistically significant manner. The results suggest the odds of negative sentiments in 2020 is $14.6\%$ higher than the odds in 2019 ($p$-value $<0.001$), and the odds of negative sentiments are $41.6\%$ higher with in-person teaching than with online teaching in 2020 ($p$-value $=0.037$) in the studied population.