Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation

Jun 27, 2012
Yuan Shi, Fei Sha

We study the problem of unsupervised domain adaptation, which aims to adapt classifiers trained on a labeled source domain to an unlabeled target domain. Many existing approaches first learn domain-invariant features and then construct classifiers with them. We propose a novel approach that jointly learn the both. Specifically, while the method identifies a feature space where data in the source and the target domains are similarly distributed, it also learns the feature space discriminatively, optimizing an information-theoretic metric as an proxy to the expected misclassification error on the target domain. We show how this optimization can be effectively carried out with simple gradient-based methods and how hyperparameters can be cross-validated without demanding any labeled data from the target domain. Empirical studies on benchmark tasks of object recognition and sentiment analysis validated our modeling assumptions and demonstrated significant improvement of our method over competing ones in classification accuracies.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012) 

  Access Paper or Ask Questions

UTNLP at SemEval-2022 Task 6: A Comparative Analysis of Sarcasm Detection using generative-based and mutation-based data augmentation

Apr 18, 2022
Amirhossein Abaskohi, Arash Rasouli, Tanin Zeraati, Behnam Bahrak

Sarcasm is a term that refers to the use of words to mock, irritate, or amuse someone. It is commonly used on social media. The metaphorical and creative nature of sarcasm presents a significant difficulty for sentiment analysis systems based on affective computing. The methodology and results of our team, UTNLP, in the SemEval-2022 shared task 6 on sarcasm detection are presented in this paper. We put different models, and data augmentation approaches to the test and report on which one works best. The tests begin with traditional machine learning models and progress to transformer-based and attention-based models. We employed data augmentation based on data mutation and data generation. Using RoBERTa and mutation-based data augmentation, our best approach achieved an F1-sarcastic of 0.38 in the competition's evaluation phase. After the competition, we fixed our model's flaws and achieved an F1-sarcastic of 0.414.

* 6 pages, 2 figures, NAACL 2022 Workshop Semeval 

  Access Paper or Ask Questions

Semi-supervised Formality Style Transfer using Language Model Discriminator and Mutual Information Maximization

Oct 10, 2020
Kunal Chawla, Diyi Yang

Formality style transfer is the task of converting informal sentences to grammatically-correct formal sentences, which can be used to improve performance of many downstream NLP tasks. In this work, we propose a semi-supervised formality style transfer model that utilizes a language model-based discriminator to maximize the likelihood of the output sentence being formal, which allows us to use maximization of token-level conditional probabilities for training. We further propose to maximize mutual information between source and target styles as our training objective instead of maximizing the regular likelihood that often leads to repetitive and trivial generated responses. Experiments showed that our model outperformed previous state-of-the-art baselines significantly in terms of both automated metrics and human judgement. We further generalized our model to unsupervised text style transfer task, and achieved significant improvements on two benchmark sentiment style transfer datasets.

* EMNLP 2020 Findings 

  Access Paper or Ask Questions

Subjective Metrics-based Cloud Market Performance Prediction

Sep 21, 2020
Ahmed Alharbi, Hai Dong

This paper explores an effective machine learning approach to predict cloud market performance for cloud consumers, providers and investors based on social media. We identified a set of comprehensive subjective metrics that may affect cloud market performance via literature survey. We used a popular sentiment analysis technique to process customer reviews collected from social media. Cloud market revenue growth was selected as an indicator of cloud market performance. We considered the revenue growth of Amazon Web Services as the stakeholder of our experiments. Three machine learning models were selected: linear regression, artificial neural network, and support vector machine. These models were compared with a time series prediction model. We found that the set of subjective metrics is able to improve the prediction performance for all the models. The support vector machine showed the best prediction results compared to the other models.

  Access Paper or Ask Questions

Impact of News on the Commodity Market: Dataset and Results

Sep 09, 2020
Ankur Sinha, Tanmay Khandait

Over the last few years, machine learning based methods have been applied to extract information from news flow in the financial domain. However, this information has mostly been in the form of the financial sentiments contained in the news headlines, primarily for the stock prices. In our current work, we propose that various other dimensions of information can be extracted from news headlines, which will be of interest to investors, policy-makers and other practitioners. We propose a framework that extracts information such as past movements and expected directionality in prices, asset comparison and other general information that the news is referring to. We apply this framework to the commodity "Gold" and train the machine learning models using a dataset of 11,412 human-annotated news headlines (released with this study), collected from the period 2000-2019. We experiment to validate the causal effect of news flow on gold prices and observe that the information produced from our framework significantly impacts the future gold price.

* 13 Pages, 2 Figures, 3 Tables 

  Access Paper or Ask Questions

Stance Detection on Social Media: State of the Art and Trends

Jun 11, 2020
Abeer AlDayel, Walid Magdy

Stance detection on social media is an emerging opinion mining paradigm for various social and political applications where sentiment analysis might be sub-optimal. This paper surveys the work on stance detection and situates its usage within current opinion mining techniques in social media. An exhaustive review of stance detection techniques on social media is presented,including the task definition, the different types of targets in stance detection, the features set used, and the various machine learning approaches applied. The survey reports the state-of-the-art results on the existing benchmark datasets on stance detection, and discusses the most effective approaches. In addition, this study explores the emerging trends and the different applications of stance detection on social media. The study concludes by providing discussion of the gaps in the current existing research and highlighting the possible future directions for stance detection on social media.

  Access Paper or Ask Questions

Causal Modeling of Twitter Activity During COVID-19

Jun 06, 2020
Oguzhan Gencoglu, Mathias Gruber

Understanding the characteristics of public attention and perception is an essential prerequisite for appropriate crisis management during adverse health events. This is even more crucial during a pandemic such as COVID-19, as primary responsibility of risk management is not centralized to a single institution, but distributed across society. While numerous studies utilize Twitter data in descriptive or predictive context during COVID-19 pandemic, causal modeling of public attention has not been investigated. In this study, we propose a causal inference approach to discover and quantify causal relationships between pandemic characteristics (e.g. number of infections and deaths) and Twitter activity as well as public sentiment. Our results show that the proposed method can successfully capture the epidemiological domain knowledge and identify variables that affect public attention and perception. We believe our work contributes to the field of infodemiology by distinguishing events that correlate with public attention from events that cause public attention.

* 12 pages, 3 figures 

  Access Paper or Ask Questions

Detecting Perceived Emotions in Hurricane Disasters

Apr 29, 2020
Shrey Desai, Cornelia Caragea, Junyi Jessy Li

Natural disasters (e.g., hurricanes) affect millions of people each year, causing widespread destruction in their wake. People have recently taken to social media websites (e.g., Twitter) to share their sentiments and feelings with the larger community. Consequently, these platforms have become instrumental in understanding and perceiving emotions at scale. In this paper, we introduce HurricaneEmo, an emotion dataset of 15,000 English tweets spanning three hurricanes: Harvey, Irma, and Maria. We present a comprehensive study of fine-grained emotions and propose classification tasks to discriminate between coarse-grained emotion groups. Our best BERT model, even after task-guided pre-training which leverages unlabeled Twitter data, achieves only 68% accuracy (averaged across all groups). HurricaneEmo serves not only as a challenging benchmark for models but also as a valuable resource for analyzing emotions in disaster-centric domains.

* Accepted to ACL 2020; code available at 

  Access Paper or Ask Questions

Depressed individuals express more distorted thinking on social media

Feb 07, 2020
Krishna C. Bathina, Marijn ten Thij, Lorenzo Lorenzo-Luaces, Lauren A. Rutter, Johan Bollen

Depression is a leading cause of disability worldwide, but is often under-diagnosed and under-treated. One of the tenets of cognitive-behavioral therapy (CBT) is that individuals who are depressed exhibit distorted modes of thinking, so-called cognitive distortions, which can negatively affect their emotions and motivation. Here, we show that individuals with a self-reported diagnosis of depression on social media express higher levels of distorted thinking than a random sample. Some types of distorted thinking were found to be more than twice as prevalent in our depressed cohort, in particular Personalizing and Emotional Reasoning. This effect is specific to the distorted content of the expression and can not be explained by the presence of specific topics, sentiment, or first-person pronouns. Our results point towards the detection, and possibly mitigation, of patterns of online language that are generally deemed depressogenic. They may also provide insight into recent observations that social media usage can have a negative impact on mental health.

  Access Paper or Ask Questions