Sentiment analysis, a popular technique for opinion mining, has been used by the software engineering research community for tasks such as assessing app reviews, developer emotions in issue trackers and developer opinions on APIs. Past research indicates that state-of-the-art sentiment analysis techniques have poor performance on SE data. This is because sentiment analysis tools are often designed to work on non-technical documents such as movie reviews. In this study, we attempt to solve the issues with existing sentiment analysis techniques for SE texts by proposing a hierarchical model based on convolutional neural networks (CNN) and long short-term memory (LSTM) trained on top of pre-trained word vectors. We assessed our model's performance and reliability by comparing it with a number of frequently used sentiment analysis tools on five gold standard datasets. Our results show that our model pushes the state of the art further on all datasets in terms of accuracy. We also show that it is possible to get better accuracy after labelling a small sample of the dataset and re-training our model rather than using an unsupervised classifier.
The Coronavirus pandemic has affected the normal course of life. People around the world have taken to social media to express their opinions and general emotions regarding this phenomenon that has taken over the world by storm. The social networking site, Twitter showed an unprecedented increase in tweets related to the novel Coronavirus in a very short span of time. This paper presents the global sentiment analysis of tweets related to Coronavirus and how the sentiment of people in different countries has changed over time. Furthermore, to determine the impact of Coronavirus on daily aspects of life, tweets related to Work From Home (WFH) and Online Learning were scraped and the change in sentiment over time was observed. In addition, various Machine Learning models such as Long Short Term Memory (LSTM) and Artificial Neural Networks (ANN) were implemented for sentiment classification and their accuracies were determined. Exploratory data analysis was also performed for a dataset providing information about the number of confirmed cases on a per-day basis in a few of the worst-hit countries to provide a comparison between the change in sentiment with the change in cases since the start of this pandemic till June 2020.
This paper describes our deep learning-based approach to multilingual aspect-based sentiment analysis as part of SemEval 2016 Task 5. We use a convolutional neural network (CNN) for both aspect extraction and aspect-based sentiment analysis. We cast aspect extraction as a multi-label classification problem, outputting probabilities over aspects parameterized by a threshold. To determine the sentiment towards an aspect, we concatenate an aspect vector with every word embedding and apply a convolution over it. Our constrained system (unconstrained for English) achieves competitive results across all languages and domains, placing first or second in 5 and 7 out of 11 language-domain pairs for aspect category detection (slot 1) and sentiment polarity (slot 3) respectively, thereby demonstrating the viability of a deep learning-based approach for multilingual aspect-based sentiment analysis.
This study aims to understand the South African political context by analysing the sentiments shared on Twitter during the local government elections. An emphasis on the analysis was placed on understanding the discussions led around four predominant political parties ANC, DA, EFF and ActionSA. A semi-supervised approach by means of a graph-based technique to label the vast accessible Twitter data for the classification of tweets into negative and positive sentiment was used. The tweets expressing negative sentiment were further analysed through latent topic extraction to uncover hidden topics of concern associated with each of the political parties. Our findings demonstrated that the general sentiment across South African Twitter users is negative towards all four predominant parties with the worst negative sentiment among users projected towards the current ruling party, ANC, relating to concerns cantered around corruption, incompetence and loadshedding.
The relationship between Facebook posts and the corresponding reaction feature is an interesting subject to explore and understand. To achieve this end, we test state-of-the-art Sinhala sentiment analysis models against a data set containing a decade worth of Sinhala posts with millions of reactions. For the purpose of establishing benchmarks and with the goal of identifying the best model for Sinhala sentiment analysis, we also test, on the same data set configuration, other deep learning models catered for sentiment analysis. In this study we report that the 3 layer Bidirectional LSTM model achieves an F1 score of 84.58% for Sinhala sentiment analysis, surpassing the current state-of-the-art model; Capsule B, which only manages to get an F1 score of 82.04%. Further, since all the deep learning models show F1 scores above 75% we conclude that it is safe to claim that Facebook reactions are suitable to predict the sentiment of a text.
Automatically mining sentiment tendency contained in natural language is a fundamental research to some artificial intelligent applications, where solutions alternate with challenges. Transfer learning and multi-task learning techniques have been leveraged to mitigate the supervision sparsity and collaborate multiple heterogeneous domains correspondingly. Recent years, the sensitive nature of users' private data raises another challenge for sentiment classification, i.e., data privacy protection. In this paper, we resort to federated learning for multiple domain sentiment classification under the constraint that the corpora must be stored on decentralized devices. In view of the heterogeneous semantics across multiple parties and the peculiarities of word embedding, we pertinently provide corresponding solutions. First, we propose a Knowledge Transfer Enhanced Private-Shared (KTEPS) framework for better model aggregation and personalization in federated sentiment classification. Second, we propose KTEPS$^\star$ with the consideration of the rich semantic and huge embedding size properties of word vectors, utilizing Projection-based Dimension Reduction (PDR) methods for privacy protection and efficient transmission simultaneously. We propose two federated sentiment classification scenes based on public benchmarks, and verify the superiorities of our proposed methods with abundant experimental investigations.
Twitter sentiment analysis, which often focuses on predicting the polarity of tweets, has attracted increasing attention over the last years, in particular with the rise of deep learning (DL). In this paper, we propose a new task: predicting the predominant sentiment among (first-order) replies to a given tweet. Therefore, we created RETWEET, a large dataset of tweets and replies manually annotated with sentiment labels. As a strong baseline, we propose a two-stage DL-based method: first, we create automatically labeled training data by applying a standard sentiment classifier to tweet replies and aggregating its predictions for each original tweet; our rationale is that individual errors made by the classifier are likely to cancel out in the aggregation step. Second, we use the automatically labeled data for supervised training of a neural network to predict reply sentiment from the original tweets. The resulting classifier is evaluated on the new RETWEET dataset, showing promising results, especially considering that it has been trained without any manually labeled data. Both the dataset and the baseline implementation are publicly available.
Generating a vivid, novel, and diverse essay with only several given topic words is a challenging task of natural language generation. In previous work, there are two problems left unsolved: neglect of sentiment beneath the text and insufficient utilization of topic-related knowledge. Therefore, we propose a novel Sentiment-Controllable topic-to-essay generator with a Topic Knowledge Graph enhanced decoder, named SCTKG, which is based on the conditional variational autoencoder (CVAE) framework. We firstly inject the sentiment information into the generator for controlling sentiment for each sentence, which leads to various generated essays. Then we design a Topic Knowledge Graph enhanced decoder. Unlike existing models that use knowledge entities separately, our model treats the knowledge graph as a whole and encodes more structured, connected semantic information in the graph to generate a more relevant essay. Experimental results show that our SCTKG can generate sentiment controllable essays and outperform the state-of-the-art approach in terms of topic relevance, fluency, and diversity on both automatic and human evaluation.