Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Sentiment Analysis": models, code, and papers

SGPT: Semantic Graphs based Pre-training for Aspect-based Sentiment Analysis

May 26, 2021
Yong Qian, Zhongqing Wang, Rong Xiao, Chen Chen, Haihong Tang

Previous studies show effective of pre-trained language models for sentiment analysis. However, most of these studies ignore the importance of sentimental information for pre-trained models.Therefore, we fully investigate the sentimental information for pre-trained models and enhance pre-trained language models with semantic graphs for sentiment analysis.In particular, we introduce Semantic Graphs based Pre-training(SGPT) using semantic graphs to obtain synonym knowledge for aspect-sentiment pairs and similar aspect/sentiment terms.We then optimize the pre-trained language model with the semantic graphs.Empirical studies on several downstream tasks show that proposed model outperforms strong pre-trained baselines. The results also show the effectiveness of proposed semantic graphs for pre-trained model.

* arXiv admin note: text overlap with arXiv:2005.05635 by other authors 

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Jun 16, 2018
N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, S. Poria

Multimodal sentiment analysis is a very actively growing field of research. A promising area of opportunity in this field is to improve the multimodal fusion mechanism. We present a novel feature fusion strategy that proceeds in a hierarchical fashion, first fusing the modalities two in two and only then fusing all three modalities. On multimodal sentiment analysis of individual utterances, our strategy outperforms conventional concatenation of features by 1%, which amounts to 5% reduction in error rate. On utterance-level multimodal sentiment analysis of multi-utterance video clips, for which current state-of-the-art techniques incorporate contextual information from other utterances of the same clip, our hierarchical fusion gives up to 2.4% (almost 10% error rate reduction) over currently used concatenation. The implementation of our method is publicly available in the form of open-source code.

* Accepted for publication at Knowledge Based Systems 

Sentiment Analysis of Twitter Data: A Survey of Techniques

Apr 22, 2016
Vishal. A. Kharde, Prof. Sheetal. Sonawane

With the advancement of web technology and its growth, there is a huge volume of data present in the web for internet users and a lot of data is generated too. Internet has become a platform for online learning, exchanging ideas and sharing opinions. Social networking sites like Twitter, Facebook, Google+ are rapidly gaining popularity as they allow people to share and express their views about topics,have discussion with different communities, or post messages across the world. There has been lot of work in the field of sentiment analysis of twitter data. This survey focuses mainly on sentiment analysis of twitter data which is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous and are either positive or negative, or neutral in some cases. In this paper, we provide a survey and a comparative analyses of existing techniques for opinion mining like machine learning and lexicon-based approaches, together with evaluation metrics. Using various machine learning algorithms like Naive Bayes, Max Entropy, and Support Vector Machine, we provide a research on twitter data streams.General challenges and applications of Sentiment Analysis on Twitter are also discussed in this paper.

* International Journal of Computer Applications 139(11): 5-15, April 2016 
* 7 figures, 10 tables 

Product Market Demand Analysis Using NLP in Banglish Text with Sentiment Analysis and Named Entity Recognition

Apr 04, 2022
Md Sabbir Hossain, Nishat Nayla, Annajiat Alim Rasel

Product market demand analysis plays a significant role for originating business strategies due to its noticeable impact on the competitive business field. Furthermore, there are roughly 228 million native Bengali speakers, the majority of whom use Banglish text to interact with one another on social media. Consumers are buying and evaluating items on social media with Banglish text as social media emerges as an online marketplace for entrepreneurs. People use social media to find preferred smartphone brands and models by sharing their positive and bad experiences with them. For this reason, our goal is to gather Banglish text data and use sentiment analysis and named entity identification to assess Bangladeshi market demand for smartphones in order to determine the most popular smartphones by gender. We scraped product related data from social media with instant data scrapers and crawled data from Wikipedia and other sites for product information with python web scrapers. Using Python's Pandas and Seaborn libraries, the raw data is filtered using NLP methods. To train our datasets for named entity recognition, we utilized Spacey's custom NER model, Amazon Comprehend Custom NER. A tensorflow sequential model was deployed with parameter tweaking for sentiment analysis. Meanwhile, we used the Google Cloud Translation API to estimate the gender of the reviewers using the BanglaLinga library. In this article, we use natural language processing (NLP) approaches and several machine learning models to identify the most in-demand items and services in the Bangladeshi market. Our model has an accuracy of 87.99% in Spacy Custom Named Entity recognition, 95.51% in Amazon Comprehend Custom NER, and 87.02% in the Sequential model for demand analysis. After Spacy's study, we were able to manage 80% of mistakes related to misspelled words using a mix of Levenshtein distance and ratio algorithms.


Data Augmentation in a Hybrid Approach for Aspect-Based Sentiment Analysis

Mar 29, 2021
Tomas Liesting, Flavius Frasincar, Maria Mihaela Trusca

Data augmentation is a way to increase the diversity of available data by applying constrained transformations on the original data. This strategy has been widely used in image classification but has to the best of our knowledge not yet been used in aspect-based sentiment analysis (ABSA). ABSA is a text analysis technique that determines aspects and their associated sentiment in opinionated text. In this paper, we investigate the effect of data augmentation on a state-of-the-art hybrid approach for aspect-based sentiment analysis (HAABSA). We apply modified versions of easy data augmentation (EDA), backtranslation, and word mixup. We evaluate the proposed techniques on the SemEval 2015 and SemEval 2016 datasets. The best result is obtained with the adjusted version of EDA, which yields a 0.5 percentage point improvement on the SemEval 2016 dataset and 1 percentage point increase on the SemEval 2015 dataset compared to the original HAABSA model.

* The 36th ACM/SIGAPP Symposium On Applied Computing, Virtual Conference, March 22-March 26, 202 

Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination

Sep 18, 2015
Olga Kolchyna, Tharsis T. P. Souza, Philip Treleaven, Tomaso Aste

This paper covers the two approaches for sentiment analysis: i) lexicon based method; ii) machine learning method. We describe several techniques to implement these approaches and discuss how they can be adopted for sentiment classification of Twitter messages. We present a comparative study of different lexicon combinations and show that enhancing sentiment lexicons with emoticons, abbreviations and social-media slang expressions increases the accuracy of lexicon-based classification for Twitter. We discuss the importance of feature generation and feature selection processes for machine learning sentiment classification. To quantify the performance of the main sentiment analysis methods over Twitter we run these algorithms on a benchmark Twitter dataset from the SemEval-2013 competition, task 2-B. The results show that machine learning method based on SVM and Naive Bayes classifiers outperforms the lexicon method. We present a new ensemble method that uses a lexicon based sentiment score as input feature for the machine learning approach. The combined method proved to produce more precise classifications. We also show that employing a cost-sensitive classifier for highly unbalanced datasets yields an improvement of sentiment classification performance up to 7%.

* Handbook of Sentiment Analysis in Finance. Mitra, G. and Yu, X. (Eds.). (2016). ISBN 1910571571 
* 32 pages, 5 figures 

A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges

Mar 02, 2022
Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing, Wai Lam

As an important fine-grained sentiment analysis problem, aspect-based sentiment analysis (ABSA), aiming to analyze and understand people's opinions at the aspect level, has been attracting considerable interest in the last decade. To handle ABSA in different scenarios, various tasks have been introduced for analyzing different sentiment elements and their relations, including the aspect term, aspect category, opinion term, and sentiment polarity. Unlike early ABSA works focusing on a single sentiment element, many compound ABSA tasks involving multiple elements have been studied in recent years for capturing more complete aspect-level sentiment information. However, a systematic review of various ABSA tasks and their corresponding solutions is still lacking, which we aim to fill in this survey. More specifically, we provide a new taxonomy for ABSA which organizes existing studies from the axes of concerned sentiment elements, with an emphasis on recent advances of compound ABSA tasks. From the perspective of solutions, we summarize the utilization of pre-trained language models for ABSA, which improved the performance of ABSA to a new stage. Besides, techniques for building more practical ABSA systems in cross-domain/lingual scenarios are discussed. Finally, we review some emerging topics and discuss some open challenges to outlook potential future directions of ABSA.


Overview of the Arabic Sentiment Analysis 2021 Competition at KAUST

Sep 29, 2021
Hind Alamro, Manal Alshehri, Basma Alharbi, Zuhair Khayyat, Manal Kalkatawi, Inji Ibrahim Jaber, Xiangliang Zhang

This paper provides an overview of the Arabic Sentiment Analysis Challenge organized by King Abdullah University of Science and Technology (KAUST). The task in this challenge is to develop machine learning models to classify a given tweet into one of the three categories Positive, Negative, or Neutral. From our recently released ASAD dataset, we provide the competitors with 55K tweets for training, and 20K tweets for validation, based on which the performance of participating teams are ranked on a leaderboard, The competition received in total 1247 submissions from 74 teams (99 team members). The final winners are determined by another private set of 20K tweets that have the same distribution as the training and validation set. In this paper, we present the main findings in the competition and summarize the methods and tools used by the top ranked teams. The full dataset of 100K labeled tweets is also released for public usage, at


Twitter Sentiment Analysis using Distributed Word and Sentence Representation

Apr 01, 2019
Dwarampudi Mahidhar Reddy, Dr. N V Subba Reddy, Dr. N V Subba Reddy

An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people's opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and sentiment analysis easier. This paper classifies tweets into positive and negative sentiments, but instead of using traditional methods or preprocessing text data here we use the distributed representations of words and sentences to classify the tweets. We use Long Short Term Memory (LSTM) Networks, Convolutional Neural Networks (CNNs) and Artificial Neural Networks. The first two are used on Distributed Representation of words while the latter is used on the distributed representation of sentences. This paper achieves accuracies as high as 81%. It also suggests the best and optimal ways for creating distributed representations of words for sentiment analysis, out of the available methods.

* 8 pages, 5 figures, 6 tables 

Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector

Sep 12, 2020
Ayu Purwarianti, Ida Ayu Putu Ari Crisdayanti

Bidirectional Long Short-Term Memory Network (Bi-LSTM) has shown promising performance in sentiment classification task. It processes inputs as sequence of information. Due to this behavior, sentiment predictions by Bi-LSTM were influenced by words sequence and the first or last phrases of the texts tend to have stronger features than other phrases. Meanwhile, in the problem scope of Indonesian sentiment analysis, phrases that express the sentiment of a document might not appear in the first or last part of the document that can lead to incorrect sentiment classification. To this end, we propose the using of an existing document representation method called paragraph vector as additional input features for Bi-LSTM. This vector provides information context of the document for each sequence processing. The paragraph vector is simply concatenated to each word vector of the document. This representation also helps to differentiate ambiguous Indonesian words. Bi-LSTM and paragraph vector were previously used as separate methods. Combining the two methods has shown a significant performance improvement of Indonesian sentiment analysis model. Several case studies on testing data showed that the proposed method can handle the sentiment phrases position problem encountered by Bi-LSTM.

* 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA)