Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Gender stereotypes in the mediated personalization of politics: Empirical evidence from a lexical, syntactic and sentiment analysis

Feb 07, 2022
Emanuele Brugnoli, Rosaria Simone, Marco Delmastro

The media attention to the personal sphere of famous and important individuals has become a key element of the gender narrative. Here we combine lexical, syntactic and sentiment analysis to investigate the role of gender in the personalization of a wide range of political office holders in Italy during the period 2017-2020. On the basis of a score for words that is introduced to account for gender unbalance in both representative and news coverage, we show that the political personalization in Italy is more detrimental for women than men, with the persistence of entrenched stereotypes including a masculine connotation of leadership, the resulting women's unsuitability to hold political functions, and a greater deal of focus on their attractiveness and body parts. In addition, women politicians are covered with a more negative tone than their men counterpart when personal details are reported. Further, the major contribution to the observed gender differences comes from online news rather than print news, suggesting that the expression of certain stereotypes may be better conveyed when click baiting and personal targeting have a major impact.

  Access Paper or Ask Questions

Regularised Text Logistic Regression: Key Word Detection and Sentiment Classification for Online Reviews

Sep 09, 2020
Ying Chen, Peng Liu, Chung Piaw Teo

Online customer reviews have become important for managers and executives in the hospitality and catering industry who wish to obtain a comprehensive understanding of their customers' demands and expectations. We propose a Regularized Text Logistic (RTL) regression model to perform text analytics and sentiment classification on unstructured text data, which automatically identifies a set of statistically significant and operationally insightful word features, and achieves satisfactory predictive classification accuracy. We apply the RTL model to two online review datasets, Restaurant and Hotel, from TripAdvisor. Our results demonstrate satisfactory classification performance compared with alternative classifiers with a highest true positive rate of 94.9%. Moreover, RTL identifies a small set of word features, corresponding to 3% for Restaurant and 20% for Hotel, which boosts working efficiency by allowing managers to drill down into a much smaller set of important customer reviews. We also develop the consistency, sparsity and oracle property of the estimator.

  Access Paper or Ask Questions

Knowledge Discovery from Social Media using Big Data provided Sentiment Analysis (SoMABiT)

Jan 16, 2020
Mahdi Bohlouli, Jens Dalter, Mareike Dornhöfer, Johannes Zenkert, Madjid Fathi

In todays competitive business world, being aware of customer needs and market-oriented production is a key success factor for industries. To this aim, the use of efficient analytic algorithms ensures a better understanding of customer feedback and improves the next generation of products. Accordingly, the dramatic increase in using social media in daily life provides beneficial sources for market analytics. But how traditional analytic algorithms and methods can scale up for such disparate and multi-structured data sources is the main challenge in this regard. This paper presents and discusses the technological and scientific focus of the SoMABiT as a social media analysis platform using big data technology. Sentiment analysis has been employed in order to discover knowledge from social media. The use of MapReduce and developing a distributed algorithm towards an integrated platform that can scale for any data volume and provide a social media-driven knowledge is the main novelty of the proposed concept in comparison to the state-of-the-art technologies.

  Access Paper or Ask Questions

Analyzing Curriculum Learning for Sentiment Analysis along Task Difficulty, Pacing and Visualization Axes

Mar 03, 2021
Anvesh Rao Vijjini, Kaveri Anuranjana, Radhika Mamidi

While Curriculum Learning (CL) has recently gained traction in Natural language Processing Tasks, it is still not adequately analyzed. Previous works only show their effectiveness but fail short to explain and interpret the internal workings fully. In this paper, we analyze curriculum learning in sentiment analysis along multiple axes. Some of these axes have been proposed by earlier works that need more in-depth study. Such analysis requires understanding where curriculum learning works and where it does not. Our axes of analysis include Task difficulty on CL, comparing CL pacing techniques, and qualitative analysis by visualizing the movement of attention scores in the model as curriculum phases progress. We find that curriculum learning works best for difficult tasks and may even lead to a decrement in performance for tasks with higher performance without curriculum learning. We see that One-Pass curriculum strategies suffer from catastrophic forgetting and attention movement visualization within curriculum pacing. This shows that curriculum learning breaks down the challenging main task into easier sub-tasks solved sequentially.

* Accepted for presentation at WASSA 2021 at EACL 

  Access Paper or Ask Questions

Recommendation Chart of Domains for Cross-Domain Sentiment Analysis:Findings of A 20 Domain Study

Apr 09, 2020
Akash Sheoran, Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient. However, the decision to choose a domain (known as the source domain) to leverage from is, at best, intuitive. In this paper, we investigate text similarity metrics to facilitate source domain selection for CDSA. We report results on 20 domains (all possible pairs) using 11 similarity metrics. Specifically, we compare CDSA performance with these metrics for different domain-pairs to enable the selection of a suitable source domain, given a target domain. These metrics include two novel metrics for evaluating domain adaptability to help source domain selection of labelled data and utilize word and sentence-based embeddings as metrics for unlabelled data. The goal of our experiments is a recommendation chart that gives the K best source domains for CDSA for a given target domain. We show that the best K source domains returned by our similarity metrics have a precision of over 50%, for varying values of K.

* 12th Edition of Language Resources and Evaluation Conference (LREC 2020) 

  Access Paper or Ask Questions

gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data

Oct 09, 2020
Sunil Gundapu, Radhika Mamidi

The phenomenon of mixing the vocabulary and syntax of multiple languages within the same utterance is called Code-Mixing. This is more evident in multilingual societies. In this paper, we have developed a system for SemEval 2020: Task 9 on Sentiment Analysis for Code-Mixed Social Media Text. Our system first generates two types of embeddings for the social media text. In those, the first one is character level embeddings to encode the character level information and to handle the out-of-vocabulary entries and the second one is FastText word embeddings for capturing morphology and semantics. These two embeddings were passed to the LSTM network and the system outperformed the baseline model.

* 6 pages, 2 figures 

  Access Paper or Ask Questions

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

Aug 23, 2018
Jose Camacho-Collados, Mohammad Taher Pilehvar

Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokenization of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.

* Blackbox EMNLP 2018. 7 pages 

  Access Paper or Ask Questions

Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild

Jan 23, 2021
Rajdeep Mukherjee, Shreyas Shetty, Subrata Chattopadhyay, Subhadeep Maji, Samik Datta, Pawan Goyal

With the exponential growth of online marketplaces and user-generated content therein, aspect-based sentiment analysis has become more important than ever. In this work, we critically review a representative sample of the models published during the past six years through the lens of a practitioner, with an eye towards deployment in production. First, our rigorous empirical evaluation reveals poor reproducibility: an average 4-5% drop in test accuracy across the sample. Second, to further bolster our confidence in empirical evaluation, we report experiments on two challenging data slices, and observe a consistent 12-55% drop in accuracy. Third, we study the possibility of transfer across domains and observe that as little as 10-25% of the domain-specific training dataset, when used in conjunction with datasets from other domains within the same locale, largely closes the gap between complete cross-domain and complete in-domain predictive performance. Lastly, we open-source two large-scale annotated review corpora from a large e-commerce portal in India in order to aid the study of replicability and transfer, with the hope that it will fuel further growth of the field.

* 12 pages, accepted at ECIR 2021 

  Access Paper or Ask Questions

A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter

Oct 12, 2018
Eric M. Clark, Ted James, Chris A. Jones, Amulya Alapati, Promise Ukandu, Christopher M. Danforth, Peter Sheridan Dodds

Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, [Crannell et. al.], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing their experience with this disease. We refer to these online public testimonies as "Invisible Patient Reported Outcomes" (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-report. Methods: Our present study aims to identify tweets related to the patient experience as an additional informative tool for monitoring public health. Using Twitter's public streaming API, we compiled over 5.3 million "breast cancer" related tweets spanning September 2016 until mid December 2017. We combined supervised machine learning methods with natural language processing to sift tweets relevant to breast cancer patient experiences. We analyzed a sample of 845 breast cancer patient and survivor accounts, responsible for over 48,000 posts. We investigated tweet content with a hedonometric sentiment analysis to quantitatively extract emotionally charged topics. Results: We found that positive experiences were shared regarding patient treatment, raising support, and spreading awareness. Further discussions related to healthcare were prevalent and largely negative focusing on fear of political legislation that could result in loss of coverage. Conclusions: Social media can provide a positive outlet for patients to discuss their needs and concerns regarding their healthcare coverage and treatment needs. Capturing iPROs from online communication can help inform healthcare professionals and lead to more connected and personalized treatment regimens.

  Access Paper or Ask Questions