Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Sentiment Analysis": models, code, and papers

ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic Levantine Tweets

May 25, 2019
Ramy Baly, Alaa Khaddaj, Hazem Hajj, Wassim El-Hajj, Khaled Bashir Shaban

Sentiment analysis is a highly subjective and challenging task. Its complexity further increases when applied to the Arabic language, mainly because of the large variety of dialects that are unstandardized and widely used in the Web, especially in social media. While many datasets have been released to train sentiment classifiers in Arabic, most of these datasets contain shallow annotation, only marking the sentiment of the text unit, as a word, a sentence or a document. In this paper, we present the Arabic Sentiment Twitter Dataset for the Levantine dialect (ArSenTD-LEV). Based on findings from analyzing tweets from the Levant region, we created a dataset of 4,000 tweets with the following annotations: the overall sentiment of the tweet, the target to which the sentiment was expressed, how the sentiment was expressed, and the topic of the tweet. Results confirm the importance of these annotations at improving the performance of a baseline sentiment classifier. They also confirm the gap of training in a certain domain, and testing in another domain.

* Corpus development, Levantine tweets, multi-topic, sentiment analysis, sentiment target, LREC-2018, OSACT-2018 

Analysis of opinionated text for opinion mining

Jul 14, 2016
K Paramesha, K C Ravishankar

In sentiment analysis, the polarities of the opinions expressed on an object/feature are determined to assess the sentiment of a sentence or document whether it is positive/negative/neutral. Naturally, the object/feature is a noun representation which refers to a product or a component of a product, let us say, the "lens" in a camera and opinions emanating on it are captured in adjectives, verbs, adverbs and noun words themselves. Apart from such words, other meta-information and diverse effective features are also going to play an important role in influencing the sentiment polarity and contribute significantly to the performance of the system. In this paper, some of the associated information/meta-data are explored and investigated in the sentiment text. Based on the analysis results presented here, there is scope for further assessment and utilization of the meta-information as features in text categorization, ranking text document, identification of spam documents and polarity classification problems.

* Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016 
* Sentiment Analysis, Features, Feature Engineering, Emotions, Word Sense Disambiguation, Sentiment Lexicons, Meta-Information 

NoReC: The Norwegian Review Corpus

Oct 15, 2017
Erik Velldal, Lilja Øvrelid, Eivind Alexander Bergem, Cathrine Stadsnes, Samia Touileb, Fredrik Jørgensen

This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1-6, as provided by the rating of the original author. This first release of the corpus comprises more than 35,000 reviews. It is distributed using the CoNLL-U format, pre-processed using UDPipe, along with a rich set of metadata. The work reported in this paper forms part of the SANT initiative (Sentiment Analysis for Norwegian Text), a project seeking to provide resources and tools for sentiment analysis and opinion mining for Norwegian. As resources for sentiment analysis have so far been unavailable for Norwegian, NoReC represents a highly valuable and sought-after addition to Norwegian language technology.

* Pending (non-anonymous) review for LREC 2018 

Bambara Language Dataset for Sentiment Analysis

Aug 05, 2021
Mountaga Diallo, Chayma Fourati, Hatem Haddad

For easier communication, posting, or commenting on each others posts, people use their dialects. In Africa, various languages and dialects exist. However, they are still underrepresented and not fully exploited for analytical studies and research purposes. In order to perform approaches like Machine Learning and Deep Learning, datasets are required. One of the African languages is Bambara, used by citizens in different countries. However, no previous work on datasets for this language was performed for Sentiment Analysis. In this paper, we present the first common-crawl-based Bambara dialectal dataset dedicated for Sentiment Analysis, available freely for Natural Language Processing research purposes.

* 2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios, International Conference on Learning Representations, 2021 

Sentiment Analysis of Cybersecurity Content on Twitter and Reddit

Apr 26, 2022
Bipun Thapa

Sentiment Analysis provides an opportunity to understand the subject(s), especially in the digital age, due to an abundance of public data and effective algorithms. Cybersecurity is a subject where opinions are plentiful and differing in the public domain. This descriptive research analyzed cybersecurity content on Twitter and Reddit to measure its sentiment, positive or negative, or neutral. The data from Twitter and Reddit was amassed via technology-specific APIs during a selected timeframe to create datasets, which were then analyzed individually for their sentiment by VADER, an NLP (Natural Language Processing) algorithm. A random sample of cybersecurity content (ten tweets and posts) was also classified for sentiments by twenty human annotators to evaluate the performance of VADER. Cybersecurity content on Twitter was at least 48% positive, and Reddit was at least 26.5% positive. The positive or neutral content far outweighed negative sentiments across both platforms. When compared to human classification, which was considered the standard or source of truth, VADER produced 60% accuracy for Twitter and 70% for Reddit in assessing the sentiment; in other words, some agreement between algorithm and human classifiers. Overall, the goal was to explore an uninhibited research topic about cybersecurity sentiment

* Content is peer-reviewed and was presented in 3rd International Conference on NLP & Information Retrieval (NLPI 2022) 

A Joint Model for Aspect-Category Sentiment Analysis with Contextualized Aspect Embedding

Aug 29, 2019
Yuncong Li, Cunxiang Yin, Ting Wei, Huiqiang Zhong, Jinchang Luo, Siqi Xu, Xiaohui Wu

Aspect-category sentiment analysis (ACSA) aims to identify all the aspect categories mentioned in the text and their corresponding sentiment polarities. Some joint models have been proposed to address this task. However, these joint models do not solve the following two problems well: mismatching between the aspect categories and the sentiment words, and data deficiency of some aspect categories. To solve them, we propose a novel joint model which contains a contextualized aspect embedding layer and a shared sentiment prediction layer. The contextualized aspect embedding layer extracts the aspect category related information, which is used to generate aspect-specific representations for sentiment classification like traditional context-independent aspect embedding (CIAE) and is therefore called contextualized aspect embedding (CAE). The CAE can mitigate the mismatching problem because it is semantically more related to sentiment words than CIAE. The shared sentiment prediction layer transfers sentiment knowledge between aspect categories and alleviates the problem caused by data deficiency. Experiments conducted on SemEval 2016 Datasets show that our proposed model achieves state-of-the-art performance.


Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

Jul 16, 2020
Lingwei Wei, Dou Hu, Wei Zhou, Xuehai Tang, Xiaodan Zhang, Xin Wang, Jizhong Han, Songlin Hu

Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information. Recent works have been devoted to leveraging text summarization and have achieved promising results. However, these summarization-based methods did not take full advantage of the summary including ignoring the inherent interactions between the summary and document. As a result, they limited the representation to express major points in the document, which is highly indicative of the key sentiment. In this paper, we study how to effectively generate a discriminative representation with explicit subject patterns and sentiment contexts for DSA. A Hierarchical Interaction Networks (HIN) is proposed to explore bidirectional interactions between the summary and document at multiple granularities and learn subject-oriented document representations for sentiment classification. Furthermore, we design a Sentiment-based Rethinking mechanism (SR) by refining the HIN with sentiment label information to learn a more sentiment-aware document representation. We extensively evaluate our proposed models on three public datasets. The experimental results consistently demonstrate the effectiveness of our proposed models and show that HIN-SR outperforms various state-of-the-art methods.

* 16 pages, accepted by ECML-PKDD 2020 

On Gobbledygook and Mood of the Philippine Senate: An Exploratory Study on the Readability and Sentiment of Selected Philippine Senators' Microposts

Aug 06, 2015
Fatima M. Moncada, Jaderick P. Pabico

This paper presents the findings of a readability assessment and sentiment analysis of selected six Philippine senators' microposts over the popular Twitter microblog. Using the Simple Measure of Gobbledygook (SMOG), tweets of Senators Cayetano, Defensor-Santiago, Pangilinan, Marcos, Guingona, and Escudero were assessed. A sentiment analysis was also done to determine the polarity of the senators' respective microposts. Results showed that on the average, the six senators are tweeting at an eight to ten SMOG level. This means that, at least a sixth grader will be able to understand the senators' tweets. Moreover, their tweets are mostly neutral and their sentiments vary in unison at some period of time. This could mean that a senator's tweet sentiment is affected by specific Philippine-based events.

* 13 pages, 6 figures, submitted to the Asia Pacific Journal on Education, Arts, and Sciences 

Sentiment Analysis of German Twitter

Nov 29, 2019
Wladimir Sidorenko

This thesis explores the ways by how people express their opinions on German Twitter, examines current approaches to automatic mining of these feelings, and proposes novel methods, which outperform state-of-the-art techniques. For this purpose, I introduce a new corpus of German tweets that have been manually annotated with sentiments, their targets and holders, as well as polar terms and their contextual modifiers. Using these data, I explore four major areas of sentiment research: (i) generation of sentiment lexicons, (ii) fine-grained opinion mining, (iii) message-level polarity classification, and (iv) discourse-aware sentiment analysis. In the first task, I compare three popular groups of lexicon generation methods: dictionary-, corpus-, and word-embedding-based ones, finding that dictionary-based systems generally yield better lexicons than the last two groups. Apart from this, I propose a linear projection algorithm, whose results surpass many existing automatic lexicons. Afterwords, in the second task, I examine two common approaches to automatic prediction of sentiments, sources, and targets: conditional random fields and recurrent neural networks, obtaining higher scores with the former model and improving these results even further by redefining the structure of CRF graphs. When dealing with message-level polarity classification, I juxtapose three major sentiment paradigms: lexicon-, machine-learning-, and deep-learning-based systems, and try to unite the first and last of these groups by introducing a bidirectional neural network with lexicon-based attention. Finally, in order to make the new classifier aware of discourse structure, I let it separately analyze the elementary discourse units of each microblog and infer the overall polarity of a message from the scores of its EDUs with the help of two new approaches: latent-marginalized CRFs and Recursive Dirichlet Process.

* Ph.D. Dissertation 

[email protected]: Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags

Oct 20, 2020
Sainik Kumar Mahata, Dipankar Das, Sivaji Bandyopadhyay

Sentiment analysis has been an active area of research in the past two decades and recently, with the advent of social media, there has been an increasing demand for sentiment analysis on social media texts. Since the social media texts are not in one language and are largely code-mixed in nature, the traditional sentiment classification models fail to produce acceptable results. This paper tries to solve this very research problem and uses bi-directional LSTMs along with language tagging, to facilitate sentiment tagging of code-mixed Tamil texts that have been extracted from social media. The presented algorithm, when evaluated on the test data, garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.