Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Sentiment Analysis": models, code, and papers

Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis

Mar 14, 2022
Yiming Zhang, Min Zhang, Sai Wu, Junbo Zhao

The aspect-based sentiment analysis (ABSA) is a fine-grained task that aims to determine the sentiment polarity towards targeted aspect terms occurring in the sentence. The development of the ABSA task is very much hindered by the lack of annotated data. To tackle this, the prior works have studied the possibility of utilizing the sentiment analysis (SA) datasets to assist in training the ABSA model, primarily via pretraining or multi-task learning. In this article, we follow this line, and for the first time, we manage to apply the Pseudo-Label (PL) method to merge the two homogeneous tasks. While it seems straightforward to use generated pseudo labels to handle this case of label granularity unification for two highly related tasks, we identify its major challenge in this paper and propose a novel framework, dubbed as Dual-granularity Pseudo Labeling (DPL). Further, similar to PL, we regard the DPL as a general framework capable of combining other prior methods in the literature. Through extensive experiments, DPL has achieved state-of-the-art performance on standard benchmarks surpassing the prior work significantly.


Curriculum Learning Strategies for Hindi-English Codemixed Sentiment Analysis

Jun 18, 2019
Anirudh Dahiya, Neeraj Battan, Manish Shrivastava, Dipti Mishra Sharma

Sentiment Analysis and other semantic tasks are commonly used for social media textual analysis to gauge public opinion and make sense from the noise on social media. The language used on social media not only commonly diverges from the formal language, but is compounded by codemixing between languages, especially in large multilingual societies like India. Traditional methods for learning semantic NLP tasks have long relied on end to end task specific training, requiring expensive data creation process, even more so for deep learning methods. This challenge is even more severe for resource scarce texts like codemixed language pairs, with lack of well learnt representations as model priors, and task specific datasets can be few and small in quantities to efficiently exploit recent deep learning approaches. To address above challenges, we introduce curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigate various training strategies for enhancing model performance. Our method outperforms the state of the art methods for Hi-En codemixed sentiment analysis by 3.31% accuracy, and also shows better model robustness in terms of convergence, and variance in test performance.


A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

Apr 16, 2022
Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, Dacheng Tao

Aspect-Based Sentiment Analysis is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence. However, it is always sensitive to the multi-aspect challenge, where features of multiple aspects in a sentence will affect each other. To mitigate this issue, we design a novel training framework, called Contrastive Cross-Channel Data Augmentation (C3DA). A source sentence will be fed a domain-specific generator to obtain some synthetic sentences and is concatenated with these generated sentences to conduct supervised training and proposed contrastive training. To be specific, considering the limited ABSA labeled data, we also introduce some parameter-efficient approaches to complete sentences generation. This novel generation method consists of an Aspect Augmentation Channel (AAC) to generate aspect-specific sentences and a Polarity Augmentation (PAC) to generate polarity-inverted sentences. According to our extensive experiments, our C3DA framework can outperform those baselines without any augmentations by about 1\% on accuracy and Macro-F1.


A Hybrid Approach for Aspect-Based Sentiment Analysis Using Deep Contextual Word Embeddings and Hierarchical Attention

Apr 18, 2020
Maria Mihaela Trusca, Daan Wassenberg, Flavius Frasincar, Rommert Dekker

The Web has become the main platform where people express their opinions about entities of interest and their associated aspects. Aspect-Based Sentiment Analysis (ABSA) aims to automatically compute the sentiment towards these aspects from opinionated text. In this paper we extend the state-of-the-art Hybrid Approach for Aspect-Based Sentiment Analysis (HAABSA) method in two directions. First we replace the non-contextual word embeddings with deep contextual word embeddings in order to better cope with the word semantics in a given text. Second, we use hierarchical attention by adding an extra attention layer to the HAABSA high-level representations in order to increase the method flexibility in modeling the input data. Using two standard datasets (SemEval 2015 and SemEval 2016) we show that the proposed extensions improve the accuracy of the built model for ABSA.

* Accepted for publication in the 20th International Conference on Web Engineering (ICWE 2020), Helsinki Finland, 9-12 June 2020 

IITK at SemEval-2020 Task 8: Unimodal and Bimodal Sentiment Analysis of Internet Memes

Jul 21, 2020
Vishal Keswani, Sakshi Singh, Suryansh Agarwal, Ashutosh Modi

Social media is abundant in visual and textual information presented together or in isolation. Memes are the most popular form, belonging to the former class. In this paper, we present our approaches for the Memotion Analysis problem as posed in SemEval-2020 Task 8. The goal of this task is to classify memes based on their emotional content and sentiment. We leverage techniques from Natural Language Processing (NLP) and Computer Vision (CV) towards the sentiment classification of internet memes (Subtask A). We consider Bimodal (text and image) as well as Unimodal (text-only) techniques in our study ranging from the Na\"ive Bayes classifier to Transformer-based approaches. Our results show that a text-only approach, a simple Feed Forward Neural Network (FFNN) with Word2vec embeddings as input, performs superior to all the others. We stand first in the Sentiment analysis task with a relative improvement of 63% over the baseline macro-F1 score. Our work is relevant to any task concerned with the combination of different modalities.

* 7 pages, 2 figures, 3 tables. Accepted at Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020) 

Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media

Jan 31, 2018
Semiha Makinist, Ibrahim Riza Hallac, Betul Ay Karakus, Galip Aydin

A public dataset, with a variety of properties suitable for sentiment analysis [1], event prediction, trend detection and other text mining applications, is needed in order to be able to successfully perform analysis studies. The vast majority of data on social media is text-based and it is not possible to directly apply machine learning processes into these raw data, since several different processes are required to prepare the data before the implementation of the algorithms. For example, different misspellings of same word enlarge the word vector space unnecessarily, thereby it leads to reduce the success of the algorithm and increase the computational power requirement. This paper presents an improved Turkish dataset with an effective spelling correction algorithm based on Hadoop [2]. The collected data is recorded on the Hadoop Distributed File System and the text based data is processed by MapReduce programming model. This method is suitable for the storage and processing of large sized text based social media data. In this study, movie reviews have been automatically recorded with Apache ManifoldCF (MCF) [3] and data clusters have been created. Various methods compared such as Levenshtein and Fuzzy String Matching have been proposed to create a public dataset from collected data. Experimental results show that the proposed algorithm, which can be used as an open source dataset in sentiment analysis studies, have been performed successfully to the detection and correction of spelling errors.

* Presented at CMES2017 

INSIGHT-1 at SemEval-2016 Task 4: Convolutional Neural Networks for Sentiment Classification and Quantification

Sep 09, 2016
Sebastian Ruder, Parsa Ghaffari, John G. Breslin

This paper describes our deep learning-based approach to sentiment analysis in Twitter as part of SemEval-2016 Task 4. We use a convolutional neural network to determine sentiment and participate in all subtasks, i.e. two-point, three-point, and five-point scale sentiment classification and two-point and five-point scale sentiment quantification. We achieve competitive results for two-point scale sentiment classification and quantification, ranking fifth and a close fourth (third and second by alternative metrics) respectively despite using only pre-trained embeddings that contain no sentiment information. We achieve good performance on three-point scale sentiment classification, ranking eighth out of 35, while performing poorly on five-point scale sentiment classification and quantification. An error analysis reveals that this is due to low expressiveness of the model to capture negative sentiment as well as an inability to take into account ordinal information. We propose improvements in order to address these and other issues.

* Proceedings of SemEval (2016): 178-182 
* Published in Proceedings of SemEval-2016, 5 pages 

SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis

Oct 30, 2015
Lorenzo Gatti, Marco Guerini, Marco Turchi

Deriving prior polarity lexica for sentiment analysis - where positive or negative scores are associated with words out of context - is a challenging task. Usually, a trade-off between precision and coverage is hard to find, and it depends on the methodology used to build the lexicon. Manually annotated lexica provide a high precision but lack in coverage, whereas automatic derivation from pre-existing knowledge guarantees high coverage at the cost of a lower precision. Since the automatic derivation of prior polarities is less time consuming than manual annotation, there has been a great bloom of these approaches, in particular based on the SentiWordNet resource. In this paper, we compare the most frequently used techniques based on SentiWordNet with newer ones and blend them in a learning framework (a so called 'ensemble method'). By taking advantage of manually built prior polarity lexica, our ensemble method is better able to predict the prior value of unseen words and to outperform all the other SentiWordNet approaches. Using this technique we have built SentiWords, a prior polarity lexicon of approximately 155,000 words, that has both a high precision and a high coverage. We finally show that in sentiment analysis tasks, using our lexicon allows us to outperform both the single metrics derived from SentiWordNet and popular manually annotated sentiment lexica.

* in Affective Computing, IEEE Transactions on (2015) 

Hindi/Bengali Sentiment Analysis Using Transfer Learning and Joint Dual Input Learning with Self Attention

Feb 11, 2022
Shahrukh Khan, Mahnoor Shahid

Sentiment Analysis typically refers to using natural language processing, text analysis and computational linguistics to extract affect and emotion based information from text data. Our work explores how we can effectively use deep neural networks in transfer learning and joint dual input learning settings to effectively classify sentiments and detect hate speech in Hindi and Bengali data. We start by training Word2Vec word embeddings for Hindi \textbf{HASOC dataset} and Bengali hate speech and then train LSTM and subsequently, employ parameter sharing based transfer learning to Bengali sentiment classifiers by reusing and fine-tuning the trained weights of Hindi classifiers with both classifier being used as baseline in our study. Finally, we use BiLSTM with self attention in joint dual input learning setting where we train a single neural network on Hindi and Bengali dataset simultaneously using their respective embeddings.


Code-Mixed Sentiment Analysis Using Machine Learning and Neural Network Approaches

Aug 09, 2018
Pruthwik Mishra, Prathyusha Danda, Pranav Dhakras

Sentiment Analysis for Indian Languages (SAIL)-Code Mixed tools contest aimed at identifying the sentence level sentiment polarity of the code-mixed dataset of Indian languages pairs (Hi-En, Ben-Hi-En). Hi-En dataset is henceforth referred to as HI-EN and Ben-Hi-En dataset as BN-EN respectively. For this, we submitted four models for sentiment analysis of code-mixed HI-EN and BN-EN datasets. The first model was an ensemble voting classifier consisting of three classifiers - linear SVM, logistic regression and random forests while the second one was a linear SVM. Both the models used TF-IDF feature vectors of character n-grams where n ranged from 2 to 6. We used scikit-learn (sklearn) machine learning library for implementing both the approaches. Run1 was obtained from the voting classifier and Run2 used the linear SVM model for producing the results. Out of the four submitted outputs Run2 outperformed Run1 in both the datasets. We finished first in the contest for both HI-EN with an F-score of 0.569 and BN-EN with an F-score of 0.526.

* 6 pages