Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Sentiment Analysis": models, code, and papers

SentiLR: Linguistic Knowledge Enhanced Language Representation for Sentiment Analysis

Nov 06, 2019
Pei Ke, Haozhe Ji, Siyang Liu, Xiaoyan Zhu, Minlie Huang

Most of the existing pre-trained language representation models neglect to consider the linguistic knowledge of texts, whereas we argue that such knowledge can promote language understanding in various NLP tasks. In this paper, we propose a novel language representation model called SentiLR, which introduces word-level linguistic knowledge including part-of-speech tag and prior sentiment polarity from SentiWordNet to benefit the downstream tasks in sentiment analysis. During pre-training, we first acquire the prior sentiment polarity of each word by querying the SentiWordNet dictionary with its part-of-speech tag. Then, we devise a new pre-training task called label-aware masked language model (LA-MLM) consisting of two subtasks: 1) word knowledge recovering given the sentence-level label; 2) sentence-level label prediction with linguistic knowledge enhanced context. Experiments show that SentiLR achieves state-of-the-art performance on several sentence-level / aspect-level sentiment analysis tasks by fine-tuning, and also obtain comparative results on general language understanding tasks.

* 11 pages 
Access Paper or Ask Questions

C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed Social Media Text using Feature Engineering

Aug 09, 2020
Laksh Advani, Clement Lu, Suraj Maharjan

In today's interconnected and multilingual world, code-mixing of languages on social media is a common occurrence. While many Natural Language Processing (NLP) tasks like sentiment analysis are mature and well designed for monolingual text, techniques to apply these tasks to code-mixed text still warrant exploration. This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix. We tackle this problem by leveraging a set of hand-engineered lexical, sentiment, and metadata features to design a classifier that can disambiguate between "positive", "negative" and "neutral" sentiment. With this model, we are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks

* SemEval-2020 Task 9 
Access Paper or Ask Questions

SA2SL: From Aspect-Based Sentiment Analysis to Social Listening System for Business Intelligence

Jun 10, 2021
Luong Luc Phan, Phuc Huynh Pham, Kim Thi-Thanh Nguyen, Tham Thi Nguyen, Sieu Khai Huynh, Luan Thanh Nguyen, Tin Van Huynh, Kiet Van Nguyen

In this paper, we present a process of building a social listening system based on aspect-based sentiment analysis in Vietnamese from creating a dataset to building a real application. Firstly, we create UIT-ViSFD, a Vietnamese Smartphone Feedback Dataset as a new benchmark corpus built based on a strict annotation schemes for evaluating aspect-based sentiment analysis, consisting of 11,122 human-annotated comments for mobile e-commerce, which is freely available for research purposes. We also present a proposed approach based on the Bi-LSTM architecture with the fastText word embeddings for the Vietnamese aspect based sentiment task. Our experiments show that our approach achieves the best performances with the F1-score of 84.48% for the aspect task and 63.06% for the sentiment task, which performs several conventional machine learning and deep learning systems. Last but not least, we build SA2SL, a social listening system based on the best performance model on our dataset, which will inspire more social listening systems in future.

Access Paper or Ask Questions

Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Mar 11, 2018
Soumil Mandal, Sainik Kumar Mahata, Dipankar Das

Analysis of informative contents and sentiments of social users has been attempted quite intensively in the recent past. Most of the systems are usable only for monolingual data and fails or gives poor results when used on data with code-mixing property. To gather attention and encourage researchers to work on this crisis, we prepared gold standard Bengali-English code-mixed data with language and polarity tag for sentiment analysis purposes. In this paper, we discuss the systems we prepared to collect and filter raw Twitter data. In order to reduce manual work while annotation, hybrid systems combining rule based and supervised models were developed for both language and sentiment tagging. The final corpus was annotated by a group of annotators following a few guidelines. The gold standard corpus thus obtained has impressive inter-annotator agreement obtained in terms of Kappa values. Various metrics like Code-Mixed Index (CMI), Code-Mixed Factor (CF) along with various aspects (language and emotion) also qualitatively polled the code-mixed and sentiment properties of the corpus.

* The 13th Workshop on Asian Language Resources (ALR), collocated with LREC 2018 
Access Paper or Ask Questions

Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos

Apr 09, 2016
Moisés H. R. Pereira, Flávio L. C. Pádua, Adriano C. M. Pereira, Fabrício Benevenuto, Daniel H. Dalip

This paper presents a novel approach to perform sentiment analysis of news videos, based on the fusion of audio, textual and visual clues extracted from their contents. The proposed approach aims at contributing to the semiodiscoursive study regarding the construction of the ethos (identity) of this media universe, which has become a central part of the modern-day lives of millions of people. To achieve this goal, we apply state-of-the-art computational methods for (1) automatic emotion recognition from facial expressions, (2) extraction of modulations in the participants' speeches and (3) sentiment analysis from the closed caption associated to the videos of interest. More specifically, we compute features, such as, visual intensities of recognized emotions, field sizes of participants, voicing probability, sound loudness, speech fundamental frequencies and the sentiment scores (polarities) from text sentences in the closed caption. Experimental results with a dataset containing 520 annotated news videos from three Brazilian and one American popular TV newscasts show that our approach achieves an accuracy of up to 84% in the sentiments (tension levels) classification task, thus demonstrating its high potential to be used by media analysts in several applications, especially, in the journalistic domain.

* 5 pages, 1 figure, International AAAI Conference on Web and Social Media 
Access Paper or Ask Questions

Sentiment of Emojis

Dec 08, 2015
Petra Kralj Novak, Jasmina Smailović, Borut Sluban, Igor Mozetič

There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.

* PLoS ONE 10(12): e0144296, 2015 
Access Paper or Ask Questions

Data augmentation for low resource sentiment analysis using generative adversarial networks

Feb 18, 2019
Rahul Gupta

Sentiment analysis is a task that may suffer from a lack of data in certain cases, as the datasets are often generated and annotated by humans. In cases where data is inadequate for training discriminative models, generate models may aid training via data augmentation. Generative Adversarial Networks (GANs) are one such model that has advanced the state of the art in several tasks, including as image and text generation. In this paper, I train GAN models on low resource datasets, then use them for the purpose of data augmentation towards improving sentiment classifier generalization. Given the constraints of limited data, I explore various techniques to train the GAN models. I also present an analysis of the quality of generated GAN data as more training data for the GAN is made available. In this analysis, the generated data is evaluated as a test set (against a model trained on real data points) as well as a training set to train classification models. Finally, I also conduct a visual analysis by projecting the generated and the real data into a two-dimensional space using the t-Distributed Stochastic Neighbor Embedding (t-SNE) method.

* Accepted to International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019 
Access Paper or Ask Questions

Sentiment Analysis of Covid-19 Tweets using Evolutionary Classification-Based LSTM Model

Jun 13, 2021
Arunava Kumar Chakraborty, Sourav Das, Anup Kumar Kolya

As the Covid-19 outbreaks rapidly all over the world day by day and also affects the lives of million, a number of countries declared complete lock-down to check its intensity. During this lockdown period, social media plat-forms have played an important role to spread information about this pandemic across the world, as people used to express their feelings through the social networks. Considering this catastrophic situation, we developed an experimental approach to analyze the reactions of people on Twitter taking into ac-count the popular words either directly or indirectly based on this pandemic. This paper represents the sentiment analysis on collected large number of tweets on Coronavirus or Covid-19. At first, we analyze the trend of public sentiment on the topics related to Covid-19 epidemic using an evolutionary classification followed by the n-gram analysis. Then we calculated the sentiment ratings on collected tweet based on their class. Finally, we trained the long-short term network using two types of rated tweets to predict sentiment on Covid-19 data and obtained an overall accuracy of 84.46%.

* In RAAI 2020. Advances in Intelligent Systems and Computing, vol 1355 (2021) 
* 11 pages, 8 figures, 5 tables 
Access Paper or Ask Questions

A Comprehensive Survey on Aspect Based Sentiment Analysis

Jun 08, 2020
Kaustubh Yadav

Aspect Based Sentiment Analysis (ABSA) is the sub-field of Natural Language Processing that deals with essentially splitting our data into aspects ad finally extracting the sentiment information. ABSA is known to provide more information about the context than general sentiment analysis. In this study, our aim is to explore the various methodologies practiced while performing ABSA, and providing a comparative study. This survey paper discusses various solutions in-depth and gives a comparison between them. And is conveniently divided into sections to get a holistic view on the process.

Access Paper or Ask Questions