In the recent decade, with the enormous growth of digital content in internet and databases, sentiment analysis has received more and more attention between information retrieval and natural language processing researchers. Sentiment analysis aims to use automated tools to detect subjective information from reviews. One of the main challenges in sentiment analysis is feature selection. Feature selection is widely used as the first stage of analysis and classification tasks to reduce the dimension of problem, and improve speed by the elimination of irrelevant and redundant features. Up to now as there are few researches conducted on feature selection in sentiment analysis, there are very rare works for Persian sentiment analysis. This paper considers the problem of sentiment classification using different feature selection methods for online customer reviews in Persian language. Three of the challenges of Persian text are using of a wide variety of declensional suffixes, different word spacing and many informal or colloquial words. In this paper we study these challenges by proposing a model for sentiment classification of Persian review documents. The proposed model is based on lemmatization and feature selection and is employed Naive Bayes algorithm for classification. We evaluate the performance of the model on a manually gathered collection of cellphone reviews, where the results show the effectiveness of the proposed approaches.
As an important fine-grained sentiment analysis problem, aspect-based sentiment analysis (ABSA), aiming to analyze and understand people's opinions at the aspect level, has been attracting considerable interest in the last decade. To handle ABSA in different scenarios, various tasks have been introduced for analyzing different sentiment elements and their relations, including the aspect term, aspect category, opinion term, and sentiment polarity. Unlike early ABSA works focusing on a single sentiment element, many compound ABSA tasks involving multiple elements have been studied in recent years for capturing more complete aspect-level sentiment information. However, a systematic review of various ABSA tasks and their corresponding solutions is still lacking, which we aim to fill in this survey. More specifically, we provide a new taxonomy for ABSA which organizes existing studies from the axes of concerned sentiment elements, with an emphasis on recent advances of compound ABSA tasks. From the perspective of solutions, we summarize the utilization of pre-trained language models for ABSA, which improved the performance of ABSA to a new stage. Besides, techniques for building more practical ABSA systems in cross-domain/lingual scenarios are discussed. Finally, we review some emerging topics and discuss some open challenges to outlook potential future directions of ABSA.
In this paper, we investigate the impact of the social media data in predicting the Tehran Stock Exchange (TSE) variables for the first time. We consider the closing price and daily return of three different stocks for this investigation. We collected our social media data from Sahamyab.com/stocktwits for about three months. To extract information from online comments, we propose a hybrid sentiment analysis approach that combines lexicon-based and learning-based methods. Since lexicons that are available for the Persian language are not practical for sentiment analysis in the stock market domain, we built a particular sentiment lexicon for this domain. After designing and calculating daily sentiment indices using the sentiment of the comments, we examine their impact on the baseline models that only use historical market data and propose new predictor models using multi regression analysis. In addition to the sentiments, we also examine the comments volume and the users' reliabilities. We conclude that the predictability of various stocks in TSE is different depending on their attributes. Moreover, we indicate that for predicting the closing price only comments volume and for predicting the daily return both the volume and the sentiment of the comments could be useful. We demonstrate that Users' Trust coefficients have different behaviors toward the three stocks.
Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of the current models can only generate plain factual descriptions about the content of a given image. However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as emotion, humor and language styles, are often incorporated to produce diverse, emotional, or appealing captions. In particular, we are interested in generating sentiment-conveying image descriptions, which has received little attention. The main challenge is how to effectively inject sentiments into the generated captions without altering the semantic matching between the visual content and the generated descriptions. In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions. Compared with the few existing approaches, the proposed models are much simpler and yet more effective. The experimental results show that our model outperform the state-of-the-art models in generating sentimental (i.e., sentiment-bearing) image captions. In addition, we can also easily manipulate the model by assigning different sentiments to the testing image to generate captions with the corresponding sentiments.
Bidirectional Long Short-Term Memory Network (Bi-LSTM) has shown promising performance in sentiment classification task. It processes inputs as sequence of information. Due to this behavior, sentiment predictions by Bi-LSTM were influenced by words sequence and the first or last phrases of the texts tend to have stronger features than other phrases. Meanwhile, in the problem scope of Indonesian sentiment analysis, phrases that express the sentiment of a document might not appear in the first or last part of the document that can lead to incorrect sentiment classification. To this end, we propose the using of an existing document representation method called paragraph vector as additional input features for Bi-LSTM. This vector provides information context of the document for each sequence processing. The paragraph vector is simply concatenated to each word vector of the document. This representation also helps to differentiate ambiguous Indonesian words. Bi-LSTM and paragraph vector were previously used as separate methods. Combining the two methods has shown a significant performance improvement of Indonesian sentiment analysis model. Several case studies on testing data showed that the proposed method can handle the sentiment phrases position problem encountered by Bi-LSTM.
Aspect sentiment triplet extraction (ASTE), which aims to identify aspects from review sentences along with their corresponding opinion expressions and sentiments, is an emerging task in fine-grained opinion mining. Since ASTE consists of multiple subtasks, including opinion entity extraction, relation detection, and sentiment classification, it is critical and challenging to appropriately capture and utilize the associations among them. In this paper, we transform ASTE task into a multi-turn machine reading comprehension (MTMRC) task and propose a bidirectional MRC (BMRC) framework to address this challenge. Specifically, we devise three types of queries, including non-restrictive extraction queries, restrictive extraction queries and sentiment classification queries, to build the associations among different subtasks. Furthermore, considering that an aspect sentiment triplet can derive from either an aspect or an opinion expression, we design a bidirectional MRC structure. One direction sequentially recognizes aspects, opinion expressions, and sentiments to obtain triplets, while the other direction identifies opinion expressions first, then aspects, and at last sentiments. By making the two directions complement each other, our framework can identify triplets more comprehensively. To verify the effectiveness of our approach, we conduct extensive experiments on four benchmark datasets. The experimental results demonstrate that BMRC achieves state-of-the-art performances.
Prior work on sentiment analysis using weak supervision primarily focuses on different reviews such as movies (IMDB), restaurants (Yelp), products (Amazon).~One under-explored field in this regard is customer chat data for a customer-agent chat in customer support due to the lack of availability of free public data. Here, we perform sentiment analysis on customer chat using weak supervision on our in-house dataset. We fine-tune the pre-trained language model (LM) RoBERTa as a sentiment classifier using weak supervision. Our contribution is as follows:1) We show that by using weak sentiment classifiers along with domain-specific lexicon-based rules as Labeling Functions (LF), we can train a fairly accurate customer chat sentiment classifier using weak supervision. 2) We compare the performance of our custom-trained model with off-the-shelf google cloud NLP API for sentiment analysis. We show that by injecting domain-specific knowledge using LFs, even with weak supervision, we can train a model to handle some domain-specific use cases better than off-the-shelf google cloud NLP API. 3) We also present an analysis of how customer sentiment in a chat relates to problem resolution.
Machine learning approaches in sentiment analysis principally rely on the abundance of resources. To limit this dependence, we propose a novel method called Siamese Network Architecture for Sentiment Analysis (SNASA) to learn representations of resource-poor languages by jointly training them with resource-rich languages using a siamese network. SNASA model consists of twin Bi-directional Long Short-Term Memory Recurrent Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive loss function, based on a similarity metric. The model learns the sentence representations of resource-poor and resource-rich language in a common sentiment space by using a similarity metric based on their individual sentiments. The model, hence, projects sentences with similar sentiment closer to each other and the sentences with different sentiment farther from each other. Experiments on large-scale datasets of resource-rich languages - English and Spanish and resource-poor languages - Hindi and Telugu reveal that SNASA outperforms the state-of-the-art sentiment analysis approaches based on distributional semantics, semantic rules, lexicon lists and deep neural network representations without sh
One crucial aspect of sentiment analysis is negation handling, where the occurrence of negation can flip the sentiment of a sentence and negatively affects the machine learning-based sentiment classification. The role of negation in Arabic sentiment analysis has been explored only to a limited extent, especially for colloquial Arabic. In this paper, the author addresses the negation problem of machine learning-based sentiment classification for a colloquial Arabic language. To this end, we propose a simple rule-based algorithm for handling the problem; the rules were crafted based on observing many cases of negation. Additionally, simple linguistic knowledge and sentiment lexicon are used for this purpose. The author also examines the impact of the proposed algorithm on the performance of different machine learning algorithms. The results given by the proposed algorithm are compared with three baseline models. The experimental results show that there is a positive impact on the classifiers accuracy, precision and recall when the proposed algorithm is used compared to the baselines.