Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Sentiment Analysis": models, code, and papers

Using Self-Organizing Maps for Sentiment Analysis

Sep 16, 2013
Anuj Sharma, Shubhamoy Dey

Web 2.0 services have enabled people to express their opinions, experience and feelings in the form of user-generated content. Sentiment analysis or opinion mining involves identifying, classifying and aggregating opinions as per their positive or negative polarity. This paper investigates the efficacy of different implementations of Self-Organizing Maps (SOM) for sentiment based visualization and classification of online reviews. Specifically, this paper implements the SOM algorithm for both supervised and unsupervised learning from text documents. The unsupervised SOM algorithm is implemented for sentiment based visualization and classification tasks. For supervised sentiment analysis, a competitive learning algorithm known as Learning Vector Quantization is used. Both algorithms are also compared with their respective multi-pass implementations where a quick rough ordering pass is followed by a fine tuning pass. The experimental results on the online movie review data set show that SOMs are well suited for sentiment based classification and sentiment polarity visualization.

* 13 Pages 

Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries

May 01, 2018
Qufei Chen, Marina Sokolova

In this study, we explored application of Word2Vec and Doc2Vec for sentiment analysis of clinical discharge summaries. We applied unsupervised learning since the data sets did not have sentiment annotations. Note that unsupervised learning is a more realistic scenario than supervised learning which requires an access to a training set of sentiment-annotated data. We aim to detect if there exists any underlying bias towards or against a certain disease. We used SentiWordNet to establish a gold sentiment standard for the data sets and evaluate performance of Word2Vec and Doc2Vec methods. We have shown that the Word2vec and Doc2Vec methods complement each other results in sentiment analysis of the data sets.

* 23 pages, 3 figures, 16 tables 

Current Landscape of the Russian Sentiment Corpora

Jun 28, 2021
Evgeny Kotelnikov

Currently, there are more than a dozen Russian-language corpora for sentiment analysis, differing in the source of the texts, domain, size, number and ratio of sentiment classes, and annotation method. This work examines publicly available Russian-language corpora, presents their qualitative and quantitative characteristics, which make it possible to get an idea of the current landscape of the corpora for sentiment analysis. The ranking of corpora by annotation quality is proposed, which can be useful when choosing corpora for training and testing. The influence of the training dataset on the performance of sentiment analysis is investigated based on the use of the deep neural network model BERT. The experiments with review corpora allow us to conclude that on average the quality of models increases with an increase in the number of training corpora. For the first time, quality scores were obtained for the corpus of reviews of ROMIP seminars based on the BERT model. Also, the study proposes the task of the building a universal model for sentiment analysis.

* 12 pages, 5 tables, 2 figures. Accepted to Dialogue-2021 conference 

Sentiment Analysis Challenges in Persian Language

Jul 09, 2019
Mohammad Heydari

The rapid growth in data on the internet requires a data mining process to reach a decision to support insight. The Persian language has strong potential for deep research in any aspect of natural language processing, especially sentimental analysis approach. Thousands of websites and blogs updates and modifies by Persian users around the world that contains millions of Persian context. This range of application requires a comprehensive structured framework to extract beneficial information for helping enterprises to enhance their business and initiate a customer-centric management process by producing effective recommender systems. Sentimental analysis is an intelligent approach for extracting useful information from huge amounts of data to help an enterprise for smart management process. In this road, machine learning and deep learning techniques will become very helpful but there is the number of challenges which are face to them. This paper tried to present and assert the most important challenges of sentimental analysis in the Persian language. This language is an Indo-European language which spoken by over 110 million people around the world and is an official language in Iran, Tajikistan, and Afghanistan. Its also widely used in Uzbekistan, Pakistan and Turkish by order.

* 10 Pages; 2 Figure 

Unsupervised Sentiment Analysis for Code-mixed Data

Jan 20, 2020
Siddharth Yadav, Tanmoy Chakraborty

Code-mixing is the practice of alternating between two or more languages. Mostly observed in multilingual societies, its occurrence is increasing and therefore its importance. A major part of sentiment analysis research has been monolingual, and most of them perform poorly on code-mixed text. In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis of code-mixed text. Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3\% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus) and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot way as compared to 0.68 F1-score in supervised settings. Our code is publicly available.


Sentence Constituent-Aware Aspect-Category Sentiment Analysis with Graph Attention Networks

Oct 04, 2020
Yuncong Li, Cunxiang Yin, Sheng-hua Zhong

Aspect category sentiment analysis (ACSA) aims to predict the sentiment polarities of the aspect categories discussed in sentences. Since a sentence usually discusses one or more aspect categories and expresses different sentiments toward them, various attention-based methods have been developed to allocate the appropriate sentiment words for the given aspect category and obtain promising results. However, most of these methods directly use the given aspect category to find the aspect category-related sentiment words, which may cause mismatching between the sentiment words and the aspect categories when an unrelated sentiment word is semantically meaningful for the given aspect category. To mitigate this problem, we propose a Sentence Constituent-Aware Network (SCAN) for aspect-category sentiment analysis. SCAN contains two graph attention modules and an interactive loss function. The graph attention modules generate representations of the nodes in sentence constituency parse trees for the aspect category detection (ACD) task and the ACSA task, respectively. ACD aims to detect aspect categories discussed in sentences and is a auxiliary task. For a given aspect category, the interactive loss function helps the ACD task to find the nodes which can predict the aspect category but can't predict other aspect categories. The sentiment words in the nodes then are used to predict the sentiment polarity of the aspect category by the ACSA task. The experimental results on five public datasets demonstrate the effectiveness of SCAN.

* Long paper accepted by NLPCC 2020 

Sentiment Analysis on Brazilian Portuguese User Reviews

Dec 10, 2021
Frederico Souza, João Filho

Sentiment Analysis is one of the most classical and primarily studied natural language processing tasks. This problem had a notable advance with the proposition of more complex and scalable machine learning models. Despite this progress, the Brazilian Portuguese language still disposes only of limited linguistic resources, such as datasets dedicated to sentiment classification, especially when considering the existence of predefined partitions in training, testing, and validation sets that would allow a more fair comparison of different algorithm alternatives. Motivated by these issues, this work analyzes the predictive performance of a range of document embedding strategies, assuming the polarity as the system outcome. This analysis includes five sentiment analysis datasets in Brazilian Portuguese, unified in a single dataset, and a reference partitioning in training, testing, and validation sets, both made publicly available through a digital repository. A cross-evaluation of dataset-specific models over different contexts is conducted to evaluate their generalization capabilities and the feasibility of adopting a unique model for addressing all scenarios.

* 6 pages, 2 figures, 6 tables. Accepted and presented at IEEE Latin American Conference on Computational Intelligence (LA-CCI 2021), but not yet published 

Multimodal Sentiment Analysis To Explore the Structure of Emotions

May 25, 2018
Anthony Hu, Seth Flaxman

We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.

* Accepted as a conference paper at KDD 2018 

SANA : Sentiment Analysis on Newspapers comments in Algeria

May 31, 2020
Hichem Rahab, Abdelhafid Zitouni, Mahieddine Djoudi

It is very current in today life to seek for tracking the people opinion from their interaction with occurring events. A very common way to do that is comments in articles published in newspapers web sites dealing with contemporary events. Sentiment analysis or opinion mining is an emergent field who is the purpose is finding the behind phenomenon masked in opinionated texts. We are interested in our work by comments in Algerian newspaper websites. For this end, two corpora were used SANA and OCA. SANA corpus is created by collection of comments from three Algerian newspapers, and annotated by two Algerian Arabic native speakers, while OCA is a freely available corpus for sentiment analysis. For the classification we adopt Supports vector machines, naive Bayes and knearest neighbors. Obtained results are very promising and show the different effects of stemming in such domain, also knearest neighbors give important improvement comparing to other classifiers unlike similar works where SVM is the most dominant. From this study we observe the importance of dedicated resources and methods the newspaper comments sentiment analysis which we look forward in future works.

* 9 pages, 2 figures, 12 tables 

Sentiment Analysis of Twitter Data for Predicting Stock Market Movements

Oct 28, 2016
Venkata Sasank Pagolu, Kamal Nayan Reddy Challa, Ganapati Panda, Babita Majhi

Predicting stock market movements is a well-known problem of interest. Now-a-days social media is perfectly representing the public sentiment and opinion about current events. Especially, twitter has attracted a lot of attention from researchers for studying the public sentiments. Stock market prediction on the basis of public sentiments expressed on twitter has been an intriguing field of research. Previous studies have concluded that the aggregate public mood collected from twitter may well be correlated with Dow Jones Industrial Average Index (DJIA). The thesis of this work is to observe how well the changes in stock prices of a company, the rises and falls, are correlated with the public opinions being expressed in tweets about that company. Understanding author's opinion from a piece of text is the objective of sentiment analysis. The present paper have employed two different textual representations, Word2vec and N-gram, for analyzing the public sentiments in tweets. In this paper, we have applied sentiment analysis and supervised machine learning principles to the tweets extracted from twitter and analyze the correlation between stock market movements of a company and sentiments in tweets. In an elaborate way, positive news and tweets in social media about a company would definitely encourage people to invest in the stocks of that company and as a result the stock price of that company would increase. At the end of the paper, it is shown that a strong correlation exists between the rise and falls in stock prices with the public sentiments in tweets.

* 6 pages 4 figures Conference Paper