Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Sentiment Analysis": models, code, and papers

Boost Phrase-level Polarity Labelling with Review-level Sentiment Classification

Feb 11, 2015
Yongfeng Zhang, Min Zhang, Yiqun Liu, Shaoping Ma

Sentiment analysis on user reviews helps to keep track of user reactions towards products, and make advices to users about what to buy. State-of-the-art review-level sentiment classification techniques could give pretty good precisions of above 90%. However, current phrase-level sentiment analysis approaches might only give sentiment polarity labelling precisions of around 70%~80%, which is far from satisfaction and restricts its application in many practical tasks. In this paper, we focus on the problem of phrase-level sentiment polarity labelling and attempt to bridge the gap between phrase-level and review-level sentiment analysis. We investigate the inconsistency between the numerical star ratings and the sentiment orientation of textual user reviews. Although they have long been treated as identical, which serves as a basic assumption in previous work, we find that this assumption is not necessarily true. We further propose to leverage the results of review-level sentiment classification to boost the performance of phrase-level polarity labelling using a novel constrained convex optimization framework. Besides, the framework is capable of integrating various kinds of information sources and heuristics, while giving the global optimal solution due to its convexity. Experimental results on both English and Chinese reviews show that our framework achieves high labelling precisions of up to 89%, which is a significant improvement from current approaches.


Sentiment analysis in Bengali via transfer learning using multi-lingual BERT

Dec 03, 2020
Khondoker Ittehadul Islam, Md. Saiful Islam, Md Ruhul Amin

Sentiment analysis (SA) in Bengali is challenging due to this Indo-Aryan language's highly inflected properties with more than 160 different inflected forms for verbs and 36 different forms for noun and 24 different forms for pronouns. The lack of standard labeled datasets in the Bengali domain makes the task of SA even harder. In this paper, we present manually tagged 2-class and 3-class SA datasets in Bengali. We also demonstrate that the multi-lingual BERT model with relevant extensions can be trained via the approach of transfer learning over those novel datasets to improve the state-of-the-art performance in sentiment classification tasks. This deep learning model achieves an accuracy of 71\% for 2-class sentiment classification compared to the current state-of-the-art accuracy of 68\%. We also present the very first Bengali SA classifier for the 3-class manually tagged dataset, and our proposed model achieves an accuracy of 60\%. We further use this model to analyze the sentiment of public comments in the online daily newspaper. Our analysis shows that people post negative comments for political or sports news more often, while the religious article comments represent positive sentiment. The dataset and code is publicly available at\_Sentiment.

* 5 pages 

DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam

Jun 09, 2021
Bharathi Raja Chakravarthi, Jishnu Parameswaran P. K, Premjith B, K. P Soman, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Kingston Pal Thamburaj, John P. McCrae

Human communication is inherently multimodal and asynchronous. Analyzing human emotions and sentiment is an emerging field of artificial intelligence. We are witnessing an increasing amount of multimodal content in local languages on social media about products and other topics. However, there are not many multimodal resources available for under-resourced Dravidian languages. Our study aims to create a multimodal sentiment analysis dataset for the under-resourced Tamil and Malayalam languages. First, we downloaded product or movies review videos from YouTube for Tamil and Malayalam. Next, we created captions for the videos with the help of annotators. Then we labelled the videos for sentiment, and verified the inter-annotator agreement using Fleiss's Kappa. This is the first multimodal sentiment analysis dataset for Tamil and Malayalam by volunteer annotators.

* 31 

The MuSe 2021 Multimodal Sentiment Analysis Challenge: Sentiment, Emotion, Physiological-Emotion, and Stress

Apr 14, 2021
Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, Björn W. Schuller

Multimodal Sentiment Analysis (MuSe) 2021 is a challenge focusing on the tasks of sentiment and emotion, as well as physiological-emotion and emotion-based stress recognition through more comprehensively integrating the audio-visual, language, and biological signal modalities. The purpose of MuSe 2021 is to bring together communities from different disciplines; mainly, the audio-visual emotion recognition community (signal-based), the sentiment analysis community (symbol-based), and the health informatics community. We present four distinct sub-challenges: MuSe-Wilder and MuSe-Stress which focus on continuous emotion (valence and arousal) prediction; MuSe-Sent, in which participants recognise five classes each for valence and arousal; and MuSe-Physio, in which the novel aspect of `physiological-emotion' is to be predicted. For this years' challenge, we utilise the MuSe-CaR dataset focusing on user-generated reviews and introduce the Ulm-TSST dataset, which displays people in stressful depositions. This paper also provides detail on the state-of-the-art feature sets extracted from these datasets for utilisation by our baseline model, a Long Short-Term Memory-Recurrent Neural Network. For each sub-challenge, a competitive baseline for participants is set; namely, on test, we report a Concordance Correlation Coefficient (CCC) of .4616 CCC for MuSe-Wilder; .4717 CCC for MuSe-Stress, and .4606 CCC for MuSe-Physio. For MuSe-Sent an F1 score of 32.82 % is obtained.


A Study of Feature Extraction techniques for Sentiment Analysis

Jun 04, 2019
Avinash Madasu, Sivasankar E

Sentiment Analysis refers to the study of systematically extracting the meaning of subjective text . When analysing sentiments from the subjective text using Machine Learning techniques,feature extraction becomes a significant part. We perform a study on the performance of feature extraction techniques TF-IDF(Term Frequency-Inverse Document Frequency) and Doc2vec (Document to Vector) using Cornell movie review datasets, UCI sentiment labeled datasets, stanford movie review datasets,effectively classifying the text into positive and negative polarities by using various pre-processing methods like eliminating StopWords and Tokenization which increases the performance of sentiment analysis in terms of accuracy and time taken by the classifier.The features obtained after applying feature extraction techniques on the text sentences are trained and tested using the classifiers Logistic Regression,Support Vector Machines,K-Nearest Neighbours , Decision Tree and Bernoulli Nave Bayes


M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Mar 23, 2022
Huisheng Mao, Ziqi Yuan, Hua Xu, Wenmeng Yu, Yihe Liu, Kai Gao

M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate advanced research by providing flexible toolkits, reliable benchmarks, and intuitive demonstrations. The platform features a fully modular video sentiment analysis framework consisting of data management, feature extraction, model training, and result analysis modules. In this paper, we first illustrate the overall architecture of the M-SENA platform and then introduce features of the core modules. Reliable baseline results of different modality features and MSA benchmarks are also reported. Moreover, we use model evaluation and analysis tools provided by M-SENA to present intermediate representation visualization, on-the-fly instance test, and generalization ability test results. The source code of the platform is publicly available at

* 11 pages, 4 figures, to be published in ACL 2022 System Demonstration Track 

Improving Document-Level Sentiment Classification Using Importance of Sentences

Mar 09, 2021
Gihyeon Choi, Shinhyeok Oh, Harksoo Kim

Previous researchers have considered sentiment analysis as a document classification task, in which input documents are classified into predefined sentiment classes. Although there are sentences in a document that support important evidences for sentiment analysis and sentences that do not, they have treated the document as a bag of sentences. In other words, they have not considered the importance of each sentence in the document. To effectively determine polarity of a document, each sentence in the document should be dealt with different degrees of importance. To address this problem, we propose a document-level sentence classification model based on deep neural networks, in which the importance degrees of sentences in documents are automatically determined through gate mechanisms. To verify our new sentiment analysis model, we conducted experiments using the sentiment datasets in the four different domains such as movie reviews, hotel reviews, restaurant reviews, and music reviews. In the experiments, the proposed model outperformed previous state-of-the-art models that do not consider importance differences of sentences in a document. The experimental results show that the importance of sentences should be considered in a document-level sentiment classification task.

* Entropy, Vol.22(12), pp.1-11, 2020.11 
* 12 pages, 7 figures, 5 tables 

Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks

Apr 03, 2018
Nurendra Choudhary, Rajat Singh, Ishita Bindlish, Manish Shrivastava

Machine learning approaches in sentiment analysis principally rely on the abundance of resources. To limit this dependence, we propose a novel method called Siamese Network Architecture for Sentiment Analysis (SNASA) to learn representations of resource-poor languages by jointly training them with resource-rich languages using a siamese network. SNASA model consists of twin Bi-directional Long Short-Term Memory Recurrent Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive loss function, based on a similarity metric. The model learns the sentence representations of resource-poor and resource-rich language in a common sentiment space by using a similarity metric based on their individual sentiments. The model, hence, projects sentences with similar sentiment closer to each other and the sentences with different sentiment farther from each other. Experiments on large-scale datasets of resource-rich languages - English and Spanish and resource-poor languages - Hindi and Telugu reveal that SNASA outperforms the state-of-the-art sentiment analysis approaches based on distributional semantics, semantic rules, lexicon lists and deep neural network representations without sh

* Accepted Long Paper at 19th International Conference on Computational Linguistics and Intelligent Text Processing, March 2018, Hanoi, Vietnam. arXiv admin note: text overlap with arXiv:1804.00806 

Enhancing Event-Level Sentiment Analysis with Structured Arguments

May 31, 2022
Qi Zhang, Jie Zhou, Qin Chen, Qinchun Bai, Liang He

Previous studies about event-level sentiment analysis (SA) usually model the event as a topic, a category or target terms, while the structured arguments (e.g., subject, object, time and location) that have potential effects on the sentiment are not well studied. In this paper, we redefine the task as structured event-level SA and propose an End-to-End Event-level Sentiment Analysis ($\textit{E}^{3}\textit{SA}$) approach to solve this issue. Specifically, we explicitly extract and model the event structure information for enhancing event-level SA. Extensive experiments demonstrate the great advantages of our proposed approach over the state-of-the-art methods. Noting the lack of the dataset, we also release a large-scale real-world dataset with event arguments and sentiment labelling for promoting more researches\footnote{The dataset is available at}.


ImpactCite: An XLNet-based method for Citation Impact Analysis

May 05, 2020
Dominique Mercier, Syed Tahseen Raza Rizvi, Vikas Rajashekar, Andreas Dengel, Sheraz Ahmed

Citations play a vital role in understanding the impact of scientific literature. Generally, citations are analyzed quantitatively whereas qualitative analysis of citations can reveal deeper insights into the impact of a scientific artifact in the community. Therefore, citation impact analysis (which includes sentiment and intent classification) enables us to quantify the quality of the citations which can eventually assist us in the estimation of ranking and impact. The contribution of this paper is two-fold. First, we benchmark the well-known language models like BERT and ALBERT along with several popular networks for both tasks of sentiment and intent classification. Second, we provide ImpactCite, which is XLNet-based method for citation impact analysis. All evaluations are performed on a set of publicly available citation analysis datasets. Evaluation results reveal that ImpactCite achieves a new state-of-the-art performance for both citation intent and sentiment classification by outperforming the existing approaches by 3.44% and 1.33% in F1-score. Therefore, we emphasize ImpactCite (XLNet-based solution) for both tasks to better understand the impact of a citation. Additional efforts have been performed to come up with CSC-Clean corpus, which is a clean and reliable dataset for citation sentiment classification.

* 12 pages (10 + 2 references), 1 figure