SOTA Transformer and DNN short text sentiment classifiers report over 97% accuracy on narrow domains like IMDB movie reviews. Real-world performance is significantly lower because traditional models overfit benchmarks and generalize poorly to different or more open domain texts. This paper introduces SentimentArcs, a new self-supervised time series sentiment analysis methodology that addresses the two main limitations of traditional supervised sentiment analysis: limited labeled training datasets and poor generalization. A large ensemble of diverse models provides a synthetic ground truth for self-supervised learning. Novel metrics jointly optimize an exhaustive search across every possible corpus:model combination. The joint optimization over both the corpus and model solves the generalization problem. Simple visualizations exploit the temporal structure in narratives so domain experts can quickly spot trends, identify key features, and note anomalies over hundreds of arcs and millions of data points. To our knowledge, this is the first self-supervised method for time series sentiment analysis and the largest survey directly comparing real-world model performance on long-form narratives.
Aspect-category sentiment analysis (ACSA) aims to predict sentiment polarities of sentences with respect to given aspect categories. To detect the sentiment toward a particular aspect category in a sentence, most previous methods first generate an aspect category-specific sentence representation for the aspect category, then predict the sentiment polarity based on the representation. These methods ignore the fact that the sentiment of an aspect category mentioned in a sentence is an aggregation of the sentiments of the words indicating the aspect category in the sentence, which leads to suboptimal performance. In this paper, we propose a Multi-Instance Multi-Label Learning Network for Aspect-Category sentiment analysis (AC-MIMLLN), which treats sentences as bags, words as instances, and the words indicating an aspect category as the key instances of the aspect category. Given a sentence and the aspect categories mentioned in the sentence, AC-MIMLLN first predicts the sentiments of the instances, then finds the key instances for the aspect categories, finally obtains the sentiments of the sentence toward the aspect categories by aggregating the key instance sentiments. Experimental results on three public datasets demonstrate the effectiveness of AC-MIMLLN.
Sentiment analysis, a popular technique for opinion mining, has been used by the software engineering research community for tasks such as assessing app reviews, developer emotions in issue trackers and developer opinions on APIs. Past research indicates that state-of-the-art sentiment analysis techniques have poor performance on SE data. This is because sentiment analysis tools are often designed to work on non-technical documents such as movie reviews. In this study, we attempt to solve the issues with existing sentiment analysis techniques for SE texts by proposing a hierarchical model based on convolutional neural networks (CNN) and long short-term memory (LSTM) trained on top of pre-trained word vectors. We assessed our model's performance and reliability by comparing it with a number of frequently used sentiment analysis tools on five gold standard datasets. Our results show that our model pushes the state of the art further on all datasets in terms of accuracy. We also show that it is possible to get better accuracy after labelling a small sample of the dataset and re-training our model rather than using an unsupervised classifier.
State of the art benchmarks for Twitter Sentiment Analysis do not consider the fact that for more than half of the tweets from the public stream a distinct sentiment cannot be chosen. This paper provides a new perspective on Twitter Sentiment Analysis by highlighting the necessity of explicitly incorporating uncertainty. Moreover, a dataset of high quality to evaluate solutions for this new problem is introduced and made publicly available.
Social networks are the main resources to gather information about people's opinion and sentiments towards different topics as they spend hours daily on social media and share their opinion. In this technical paper, we show the application of sentimental analysis and how to connect to Twitter and run sentimental analysis queries. We run experiments on different queries from politics to humanity and show the interesting results. We realized that the neutral sentiments for tweets are significantly high which clearly shows the limitations of the current works.
In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. First, we investigate how useful a pre-trained language model would be in a 2-step pipeline approach employing Automatic Speech Recognition (ASR) and transcripts-based sentiment analysis separately. Second, we propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach to take advantage of a large, but unlabeled speech dataset for training. Although spoken and written texts have different linguistic characteristics, they can complement each other in understanding sentiment. Therefore, the proposed system can not only model acoustic characteristics to bear sentiment-specific information in speech signals, but learn latent information to carry sentiments in the text representation. In these experiments, we demonstrate the proposed approaches improve F1 scores consistently compared to systems without a language model. Moreover, we also show that the proposed framework can reduce 65% of human supervision by leveraging a large amount of data without human sentiment annotation and boost performance in a low-resource condition where the human sentiment annotation is not available enough.
The ability to identify sentiment in text, referred to as sentiment analysis, is one which is natural to adult humans. This task is, however, not one which a computer can perform by default. Identifying sentiments in an automated, algorithmic manner will be a useful capability for business and research in their search to understand what consumers think about their products or services and to understand human sociology. Here we propose two new Genetic Algorithms (GAs) for the task of automated text sentiment analysis. The GAs learn whether words occurring in a text corpus are either sentiment or amplifier words, and their corresponding magnitude. Sentiment words, such as 'horrible', add linearly to the final sentiment. Amplifier words in contrast, which are typically adjectives/adverbs like 'very', multiply the sentiment of the following word. This increases, decreases or negates the sentiment of the following word. The sentiment of the full text is then the sum of these terms. This approach grows both a sentiment and amplifier dictionary which can be reused for other purposes and fed into other machine learning algorithms. We report the results of multiple experiments conducted on large Amazon data sets. The results reveal that our proposed approach was able to outperform several public and/or commercial sentiment analysis algorithms.
Hashtag segmentation, also known as hashtag decomposition, is a common step in preprocessing pipelines for social media datasets. It usually precedes tasks such as sentiment analysis and hate speech detection. For sentiment analysis in medium to low-resourced languages, previous research has demonstrated that a multilingual approach that resorts to machine translation can be competitive or superior to previous approaches to the task. We develop a zero-shot hashtag segmentation framework and demonstrate how it can be used to improve the accuracy of multilingual sentiment analysis pipelines. Our zero-shot framework establishes a new state-of-the-art for hashtag segmentation datasets, surpassing even previous approaches that relied on feature engineering and language models trained on in-domain data.
We report on the use of sentiment analysis on news and social media to analyze and predict the price of Bitcoin. Bitcoin is the leading cryptocurrency and has the highest market capitalization among digital currencies. Predicting Bitcoin values may help understand and predict potential market movement and future growth of the technology. Unlike (mostly) repeating phenomena like weather, cryptocurrency values do not follow a repeating pattern and mere past value of Bitcoin does not reveal any secret of future Bitcoin value. Humans follow general sentiments and technical analysis to invest in the market. Hence considering people's sentiment can give a good degree of prediction. We focus on using social sentiment as a feature to predict future Bitcoin value, and in particular, consider Google News and Reddit posts. We find that social sentiment gives a good estimate of how future Bitcoin values may move. We achieve the lowest test RMSE of 434.87 using an LSTM that takes as inputs the historical price of various cryptocurrencies, the sentiment of news articles and the sentiment of Reddit posts.
Web 2.0 services have enabled people to express their opinions, experience and feelings in the form of user-generated content. Sentiment analysis or opinion mining involves identifying, classifying and aggregating opinions as per their positive or negative polarity. This paper investigates the efficacy of different implementations of Self-Organizing Maps (SOM) for sentiment based visualization and classification of online reviews. Specifically, this paper implements the SOM algorithm for both supervised and unsupervised learning from text documents. The unsupervised SOM algorithm is implemented for sentiment based visualization and classification tasks. For supervised sentiment analysis, a competitive learning algorithm known as Learning Vector Quantization is used. Both algorithms are also compared with their respective multi-pass implementations where a quick rough ordering pass is followed by a fine tuning pass. The experimental results on the online movie review data set show that SOMs are well suited for sentiment based classification and sentiment polarity visualization.