Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Sentiment Analysis": models, code, and papers

Supervised Term Weighting Metrics for Sentiment Analysis in Short Text

Oct 10, 2016
Hussam Hamdan, Patrice Bellot, Frederic Bechet

Term weighting metrics assign weights to terms in order to discriminate the important terms from the less crucial ones. Due to this characteristic, these metrics have attracted growing attention in text classification and recently in sentiment analysis. Using the weights given by such metrics could lead to more accurate document representation which may improve the performance of the classification. While previous studies have focused on proposing or comparing different weighting metrics at two-classes document level sentiment analysis, this study propose to analyse the results given by each metric in order to find out the characteristics of good and bad weighting metrics. Therefore we present an empirical study of fifteen global supervised weighting metrics with four local weighting metrics adopted from information retrieval, we also give an analysis to understand the behavior of each metric by observing and analysing how each metric distributes the terms and deduce some characteristics which may distinguish the good and bad metrics. The evaluation has been done using Support Vector Machine on three different datasets: Twitter, restaurant and laptop reviews.


Sentiment Analysis for Sinhala Language using Deep Learning Techniques

Nov 14, 2020
Lahiru Senevirathne, Piyumal Demotte, Binod Karunanayake, Udyogi Munasinghe, Surangika Ranathunga

Due to the high impact of the fast-evolving fields of machine learning and deep learning, Natural Language Processing (NLP) tasks have further obtained comprehensive performances for highly resourced languages such as English and Chinese. However Sinhala, which is an under-resourced language with a rich morphology, has not experienced these advancements. For sentiment analysis, there exists only two previous research with deep learning approaches, which focused only on document-level sentiment analysis for the binary case. They experimented with only three types of deep learning models. In contrast, this paper presents a much comprehensive study on the use of standard sequence models such as RNN, LSTM, Bi-LSTM, as well as more recent state-of-the-art models such as hierarchical attention hybrid neural networks, and capsule networks. Classification is done at document-level but with more granularity by considering POSITIVE, NEGATIVE, NEUTRAL, and CONFLICT classes. A data set of 15059 Sinhala news comments, annotated with these four classes and a corpus consists of 9.48 million tokens are publicly released. This is the largest sentiment annotated data set for Sinhala so far.


Emoji-based Co-attention Network for Microblog Sentiment Analysis

Oct 27, 2021
Xiaowei Yuan, Jingyuan Hu, Xiaodan Zhang, Honglei Lv, Hao Liu

Emojis are widely used in online social networks to express emotions, attitudes, and opinions. As emotional-oriented characters, emojis can be modeled as important features of emotions towards the recipient or subject for sentiment analysis. However, existing methods mainly take emojis as heuristic information that fails to resolve the problem of ambiguity noise. Recent researches have utilized emojis as an independent input to classify text sentiment but they ignore the emotional impact of the interaction between text and emojis. It results that the emotional semantics of emojis cannot be fully explored. In this paper, we propose an emoji-based co-attention network that learns the mutual emotional semantics between text and emojis on microblogs. Our model adopts the co-attention mechanism based on bidirectional long short-term memory incorporating the text and emojis, and integrates a squeeze-and-excitation block in a convolutional neural network classifier to increase its sensitivity to emotional semantic features. Experimental results show that the proposed method can significantly outperform several baselines for sentiment analysis on short texts of social media.


Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Sep 21, 2016
Kar Wai Lim, Wray Buntine

Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are laden with opinions, their "dirty" nature (as natural language) has discouraged researchers from applying LDA-based opinion model for product review mining. Tweets are often informal, unstructured and lacking labeled data such as categories and ratings, making it challenging for product opinion mining. In this paper, we propose an LDA-based opinion model named Twitter Opinion Topic Model (TOTM) for opinion mining and sentiment analysis. TOTM leverages hashtags, mentions, emoticons and strong sentiment words that are present in tweets in its discovery process. It improves opinion prediction by modeling the target-opinion interaction directly, thus discovering target specific opinion words, neglected in existing approaches. Moreover, we propose a new formulation of incorporating sentiment prior information into a topic model, by utilizing an existing public sentiment lexicon. This is novel in that it learns and updates with the data. We conduct experiments on 9 million tweets on electronic products, and demonstrate the improved performance of TOTM in both quantitative evaluations and qualitative analysis. We show that aspect-based opinion analysis on massive volume of tweets provides useful opinions on products.

* Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM), pp. 1319-1328. ACM. 2014 
* CIKM paper 

WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarization With Wikipedia

Sep 18, 2012
Subhabrata Mukherjee, Pushpak Bhattacharyya

This paper describes a weakly supervised system for sentiment analysis in the movie review domain. The objective is to classify a movie review into a polarity class, positive or negative, based on those sentences bearing opinion on the movie alone. The irrelevant text, not directly related to the reviewer opinion on the movie, is left out of analysis. Wikipedia incorporates the world knowledge of movie-specific features in the system which is used to obtain an extractive summary of the review, consisting of the reviewer's opinions about the specific aspects of the movie. This filters out the concepts which are irrelevant or objective with respect to the given movie. The proposed system, WikiSent, does not require any labeled data for training. The only weak supervision arises out of the usage of resources like WordNet, Part-of-Speech Tagger and Sentiment Lexicons by virtue of their construction. WikiSent achieves a considerable accuracy improvement over the baseline and has a better or comparable accuracy to the existing semi-supervised and unsupervised systems in the domain, on the same dataset. We also perform a general movie review trend analysis using WikiSent to find the trend in movie-making and the public acceptance in terms of movie genre, year of release and polarity.

* Lecture Notes in Computer Science Volume 7523, 2012, pp 774-793 
* The paper is available at 

Domain Adaptation for Sentiment Analysis Using Increased Intraclass Separation

Jul 04, 2021
Mohammad Rostami, Aram Galstyan

Sentiment analysis is a costly yet necessary task for enterprises to study the opinions of their customers to improve their products and to determine optimal marketing strategies. Due to the existence of a wide range of domains across different products and services, cross-domain sentiment analysis methods have received significant attention. These methods mitigate the domain gap between different applications by training cross-domain generalizable classifiers which help to relax the need for data annotation for each domain. Most existing methods focus on learning domain-agnostic representations that are invariant with respect to both the source and the target domains. As a result, a classifier that is trained using the source domain annotated data would generalize well in a related target domain. We introduce a new domain adaptation method which induces large margins between different classes in an embedding space. This embedding space is trained to be domain-agnostic by matching the data distributions across the domains. Large intraclass margins in the source domain help to reduce the effect of "domain shift" on the classifier performance in the target domain. Theoretical and empirical analysis are provided to demonstrate that the proposed method is effective.


Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

Apr 21, 2022
Yan Ling, Jianfei Yu, Rui Xia

As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention in recent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodal alignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify finegrained aspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-Language Pre-training framework for MABSA (VLPMABSA), which is a unified multimodal encoder-decoder architecture for all the pretraining and downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodal modalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pretraining task. The source code is publicly released at

* Accepted by ACL 2022 (long paper) 

A New Statistical Approach for Comparing Algorithms for Lexicon Based Sentiment Analysis

Jun 20, 2019
Mateus Machado, Evandro Ruiz, Kuruvilla Joseph Abraham

Lexicon based sentiment analysis usually relies on the identification of various words to which a numerical value corresponding to sentiment can be assigned. In principle, classifiers can be obtained from these algorithms by comparison with human annotation, which is considered the gold standard. In practise this is difficult in languages such as Portuguese where there is a paucity of human annotated texts. Thus in order to compare algorithms, a next best step is to directly compare different algorithms with each other without referring to human annotation. In this paper we develop methods for a statistical comparison of algorithms which does not rely on human annotation or on known class labels. We will motivate the use of marginal homogeneity tests, as well as log linear models within the framework of maximum likelihood estimation We will also show how some uncertainties present in lexicon based sentiment analysis may be similar to those which occur in human annotated tweets. We will also show how the variability in the output of different algorithms is lexicon dependent, and quantify this variability in the output within the framework of log linear models.


SentiLSTM: A Deep Learning Approach for Sentiment Analysis of Restaurant Reviews

Nov 19, 2020
Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Iqbal H. Sarker

The amount of textual data generation has increased enormously due to the effortless access of the Internet and the evolution of various web 2.0 applications. These textual data productions resulted because of the people express their opinion, emotion or sentiment about any product or service in the form of tweets, Facebook post or status, blog write up, and reviews. Sentiment analysis deals with the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude toward a particular topic is positive, negative, or neutral. The impact of customer review is significant to perceive the customer attitude towards a restaurant. Thus, the automatic detection of sentiment from reviews is advantageous for the restaurant owners, or service providers and customers to make their decisions or services more satisfactory. This paper proposes, a deep learning-based technique (i.e., BiLSTM) to classify the reviews provided by the clients of the restaurant into positive and negative polarities. A corpus consists of 8435 reviews is constructed to evaluate the proposed technique. In addition, a comparative analysis of the proposed technique with other machine learning algorithms presented. The results of the evaluation on test dataset show that BiLSTM technique produced in the highest accuracy of 91.35%.

* 13 page, will appear in 20th International Conference on Hybrid Intelligent Systems (HIS 2020) 

Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Mar 27, 2021
Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, Alan W Black

Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question - `Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?'. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7\% (weighted F1 scores) when compared to supervised models trained for a two class problem.