People use the world wide web heavily to share their experience with entities such as products, services, or travel destinations. Texts that provide online feedback in the form of reviews and comments are essential to make consumer decisions. These comments create a valuable source that may be used to measure satisfaction related to products or services. Sentiment analysis is the task of identifying opinions expressed in such text fragments. In this work, we develop two methods that combine different types of word vectors to learn and estimate polarity of reviews. We develop average review vectors from word vectors and add weights to this review vectors using word frequencies in positive and negative sensitivity-tagged reviews. We applied the methods to several datasets from different domains that are used as standard benchmarks for sentiment analysis. We ensemble the techniques with each other and existing methods, and we make a comparison with the approaches in the literature. The results show that the performances of our approaches outperform the state-of-the-art success rates.
Social scientists and psychologists take interest in understanding how people express emotions or sentiments when dealing with catastrophic events such as natural disasters, political unrest, and terrorism. The COVID-19 pandemic is a catastrophic event that has raised a number of psychological issues such as depression given abrupt social changes and lack of employment. During the rise of COVID-19 cases with stricter lock downs, people have been expressing their sentiments in social media which can provide a deep understanding of how people physiologically react to catastrophic events. In this paper, we use deep learning based language models via long short-term memory (LSTM) recurrent neural networks for sentiment analysis on Twitter with a focus of rise of novel cases in India. We use the LSTM model with a global vector (GloVe) for word representation in building a language model. We review the sentiments expressed for selective months covering the major peak of new cases in 2020. We present a framework that focuses on multi-label sentiment classification using LSTM model and GloVe embedding, where more than one sentiment can be expressed at once. Our results show that the majority of the tweets have been positive with high levels of optimism during the rise of the COVID-19 cases in India. We find that the number of tweets significantly lowered towards the peak of new cases. We find that the optimistic and joking tweets mostly dominated the monthly tweets and there was a much lower number of negative sentiments expressed. This could imply that the majority were generally positive and some annoyed towards the way the pandemic was handled by the authorities as their peak was reached.
Sentiment analysis can provide a suitable lead for the tools used in software engineering along with the API recommendation systems and relevant libraries to be used. In this context, the existing tools like SentiCR, SentiStrength-SE, etc. exhibited low f1-scores that completely defeats the purpose of deployment of such strategies, thereby there is enough scope for performance improvement. Recent advancements show that transformer based pre-trained models (e.g., BERT, RoBERTa, ALBERT, etc.) have displayed better results in the text classification task. Following this context, the present research explores different BERT-based models to analyze the sentences in GitHub comments, Jira comments, and Stack Overflow posts. The paper presents three different strategies to analyse BERT based model for sentiment analysis, where in the first strategy the BERT based pre-trained models are fine-tuned; in the second strategy an ensemble model is developed from BERT variants, and in the third strategy a compressed model (Distil BERT) is used. The experimental results show that the BERT based ensemble approach and the compressed BERT model attain improvements by 6-12% over prevailing tools for the F1 measure on all three datasets.
With the current upsurge in the usage of social media platforms, the trend of using short text (microtext) in place of standard words has seen a significant rise. The usage of microtext poses a considerable performance issue in concept-level sentiment analysis, since models are trained on standard words. This paper discusses the impact of coupling sub-symbolic (phonetics) with symbolic (machine learning) Artificial Intelligence to transform the out-of-vocabulary concepts into their standard in-vocabulary form. The phonetic distance is calculated using the Sorensen similarity algorithm. The phonetically similar invocabulary concepts thus obtained are then used to compute the correct polarity value, which was previously being miscalculated because of the presence of microtext. Our proposed framework increases the accuracy of polarity detection by 6% as compared to the earlier model. This also validates the fact that microtext normalization is a necessary pre-requisite for the sentiment analysis task.
This paper discusses the results obtained for different techniques applied for performing the sentiment analysis of social media (Twitter) code-mixed text written in Hinglish. The various stages involved in performing the sentiment analysis were data consolidation, data cleaning, data transformation and modelling. Various data cleaning techniques were applied, data was cleaned in five iterations and the results of experiments conducted were noted after each iteration. Data was transformed using count vectorizer, one hot vectorizer, tf-idf vectorizer, doc2vec, word2vec and fasttext embeddings. The models were created using various machine learning algorithms such as SVM, KNN, Decision Trees, Random Forests, Naive Bayes, Logistic Regression, and ensemble voting classifiers. The data was obtained from a task on Codalab competition website which was listed as Task:9 on the Semeval-2020 competition website. The models created were evaluated using the F1-score (macro). The best F1-score of 69.07 was achieved using ensemble voting classifier.
The current research is focusing on the area of Opinion Mining also called as sentiment analysis due to sheer volume of opinion rich web resources such as discussion forums, review sites and blogs are available in digital form. One important problem in sentiment analysis of product reviews is to produce summary of opinions based on product features. We have surveyed and analyzed in this paper, various techniques that have been developed for the key tasks of opinion mining. We have provided an overall picture of what is involved in developing a software system for opinion mining on the basis of our survey and analysis.
Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text. However, existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects. To solve this problem, we develop a simple but effective approach to enrich ABSA test sets. Specifically, we generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect's sentiment. Based on the SemEval 2014 dataset, we construct the Aspect Robustness Test Set (ARTS) as a comprehensive probe of the aspect robustness of ABSA models. Over 92% data of ARTS show high fluency and desired sentiment on all aspects by human evaluation. Using ARTS, we analyze the robustness of nine ABSA models, and observe, surprisingly, that their accuracy drops by up to 69.73%. We explore several ways to improve aspect robustness, and find that adversarial training can improve models' performance on ARTS by up to 32.85%. Our code and new test set are available at https://github.com/zhijing-jin/ARTS_TestSet
Chinese sentiment analysis (CSA) has always been one of the challenges in natural language processing due to its complexity and uncertainty. Transformer has succeeded in capturing semantic features, but it uses position encoding to capture sequence features, which has great shortcomings compared with the recurrent model. In this paper, we propose T-E-GRU for Chinese sentiment analysis, which combine transformer encoder and GRU. We conducted experiments on three Chinese comment datasets. In view of the confusion of punctuation marks in Chinese comment texts, we selectively retain some punctuation marks with sentence segmentation ability. The experimental results show that T-E-GRU outperforms classic recurrent model and recurrent model with attention.
Sentiment Analysis has seen much progress in the past two decades. For the past few years, neural network approaches, primarily RNNs and CNNs, have been the most successful for this task. Recently, a new category of neural networks, self-attention networks (SANs), have been created which utilizes the attention mechanism as the basic building block. Self-attention networks have been shown to be effective for sequence modeling tasks, while having no recurrence or convolutions. In this work we explore the effectiveness of the SANs for sentiment analysis. We demonstrate that SANs are superior in performance to their RNN and CNN counterparts by comparing their classification accuracy on six datasets as well as their model characteristics such as training speed and memory consumption. Finally, we explore the effects of various SAN modifications such as multi-head attention as well as two methods of incorporating sequence position information into SANs.