Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Automatic Domain Adaptation Outperforms Manual Domain Adaptation for Predicting Financial Outcomes

Jun 25, 2020
Marina Sedinkina, Nikolas Breitkopf, Hinrich Schütze

In this paper, we automatically create sentiment dictionaries for predicting financial outcomes. We compare three approaches: (I) manual adaptation of the domain-general dictionary H4N, (ii) automatic adaptation of H4N and (iii) a combination consisting of first manual, then automatic adaptation. In our experiments, we demonstrate that the automatically adapted sentiment dictionary outperforms the previous state of the art in predicting the financial outcomes excess return and volatility. In particular, automatic adaptation performs better than manual adaptation. In our analysis, we find that annotation based on an expert's a priori belief about a word's meaning can be incorrect -- annotation should be performed based on the word's contexts in the target domain instead.

* Accepted at ACL2019 

  Access Paper or Ask Questions

From Cognitive to Computational Modeling: Text-based Risky Decision-Making Guided by Fuzzy Trace Theory

May 15, 2022
Jaron Mar, Jiamou Liu

Understanding, modelling and predicting human risky decision-making is challenging due to intrinsic individual differences and irrationality. Fuzzy trace theory (FTT) is a powerful paradigm that explains human decision-making by incorporating gists, i.e., fuzzy representations of information which capture only its quintessential meaning. Inspired by Broniatowski and Reyna's FTT cognitive model, we propose a computational framework which combines the effects of the underlying semantics and sentiments on text-based decision-making. In particular, we introduce Category-2-Vector to learn categorical gists and categorical sentiments, and demonstrate how our computational model can be optimised to predict risky decision-making in groups and individuals.

  Access Paper or Ask Questions

Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

May 20, 2021
Boaz Shmueli, Soumya Ray, Lun-Wei Ku

Datasets with induced emotion labels are scarce but of utmost importance for many NLP tasks. We present a new, automated method for collecting texts along with their induced reaction labels. The method exploits the online use of reaction GIFs, which capture complex affective states. We show how to augment the data with induced emotion and induced sentiment labels. We use our method to create and publish ReactionGIF, a first-of-its-kind affective dataset of 30K tweets. We provide baselines for three new tasks, including induced sentiment prediction and multilabel classification of induced emotions. Our method and dataset open new research opportunities in emotion detection and affective computing.

* To be published in ACL 2021. 7 pages, 4 figures, 2 tables 

  Access Paper or Ask Questions

Talaia: a Real time Monitor of Social Media and Digital Press

Sep 28, 2018
Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri

Talaia is a platform for monitoring social media and digital press. A configurable crawler gathers content with respect to user defined domains or topics. Crawled data is processed by means of IXA-pipes NLP chain and EliXa sentiment analysis system. A Django powered interface provides data visualization to provide the user analysis of the data. This paper presents the architecture of the system and describes in detail the different components of the system. To prove the validity of the approach, two real use cases are accounted for, one in the cultural domain and one in the political domain. Evaluation for the sentiment analysis task in both scenarios is also provided, showing the capacity for domain adaptation.

* Preprint draft, 21 pages 

  Access Paper or Ask Questions

Are Top School Students More Critical of Their Professors? Mining Comments on

Jan 23, 2021
Ziqi Tang, Yutong Wang, Jiebo Luo

Student reviews and comments on reflect realistic learning experiences of students. Such information provides a large-scale data source to examine the teaching quality of the lecturers. In this paper, we propose an in-depth analysis of these comments. First, we partition our data into different comparison groups. Next, we perform exploratory data analysis to delve into the data. Furthermore, we employ Latent Dirichlet Allocation and sentiment analysis to extract topics and understand the sentiments associated with the comments. We uncover interesting insights about the characteristics of both college students and professors. Our study proves that student reviews and comments contain crucial information and can serve as essential references for enrollment in courses and universities.

  Access Paper or Ask Questions

Predicting Different Types of Subtle Toxicity in Unhealthy Online Conversations

Jun 07, 2021
Shlok Gilda, Mirela Silva, Luiz Giovanini, Daniela Oliveira

This paper investigates the use of machine learning models for the classification of unhealthy online conversations containing one or more forms of subtler abuse, such as hostility, sarcasm, and generalization. We leveraged a public dataset of 44K online comments containing healthy and unhealthy comments labeled with seven forms of subtle toxicity. We were able to distinguish between these comments with a top micro F1-score, macro F1-score, and ROC-AUC of 88.76%, 67.98%, and 0.71, respectively. Hostile comments were easier to detect than other types of unhealthy comments. We also conducted a sentiment analysis which revealed that most types of unhealthy comments were associated with a slight negative sentiment, with hostile comments being the most negative ones.

  Access Paper or Ask Questions

Topic Modelling on Consumer Financial Protection Bureau Data: An Approach Using BERT Based Embeddings

May 15, 2022
Vasudeva Raju Sangaraju, Bharath Kumar Bolla, Deepak Kumar Nayak, Jyothsna Kh

Customers' reviews and comments are important for businesses to understand users' sentiment about the products and services. However, this data needs to be analyzed to assess the sentiment associated with topics/aspects to provide efficient customer assistance. LDA and LSA fail to capture the semantic relationship and are not specific to any domain. In this study, we evaluate BERTopic, a novel method that generates topics using sentence embeddings on Consumer Financial Protection Bureau (CFPB) data. Our work shows that BERTopic is flexible and yet provides meaningful and diverse topics compared to LDA and LSA. Furthermore, domain-specific pre-trained embeddings (FinBERT) yield even better topics. We evaluated the topics on coherence score (c_v) and UMass.

* Accepted at International Conference for Convergence in Technology, 2022 

  Access Paper or Ask Questions

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

May 30, 2015
Kai Sheng Tai, Richard Socher, Christopher D. Manning

Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

* Accepted for publication at ACL 2015 

  Access Paper or Ask Questions

Analyzing Self-Driving Cars on Twitter

Apr 05, 2018
Rizwan Sadiq, Mohsin Khan

This paper studies users' perception regarding a controversial product, namely self-driving (autonomous) cars. To find people's opinion regarding this new technology, we used an annotated Twitter dataset, and extracted the topics in positive and negative tweets using an unsupervised, probabilistic model known as topic modeling. We later used the topics, as well as linguist and Twitter specific features to classify the sentiment of the tweets. Regarding the opinions, the result of our analysis shows that people are optimistic and excited about the future technology, but at the same time they find it dangerous and not reliable. For the classification task, we found Twitter specific features, such as hashtags as well as linguistic features such as emphatic words among top attributes in classifying the sentiment of the tweets.

  Access Paper or Ask Questions

Automatically augmenting an emotion dataset improves classification using audio

Mar 30, 2018
Egor Lakomkin, Cornelius Weber, Stefan Wermter

In this work, we tackle a problem of speech emotion classification. One of the issues in the area of affective computation is that the amount of annotated data is very limited. On the other hand, the number of ways that the same emotion can be expressed verbally is enormous due to variability between speakers. This is one of the factors that limits performance and generalization. We propose a simple method that extracts audio samples from movies using textual sentiment analysis. As a result, it is possible to automatically construct a larger dataset of audio samples with positive, negative emotional and neutral speech. We show that pretraining recurrent neural network on such a dataset yields better results on the challenging EmotiW corpus. This experiment shows a potential benefit of combining textual sentiment analysis with vocal information.

  Access Paper or Ask Questions