Sociodemographic biases are a common problem for natural language processing, affecting the fairness and integrity of its applications. Within sentiment analysis, these biases may undermine sentiment predictions for texts that mention personal attributes that unbiased human readers would consider neutral. Such discrimination can have great consequences in the applications of sentiment analysis both in the public and private sectors. For example, incorrect inferences in applications like online abuse and opinion analysis in social media platforms can lead to unwanted ramifications, such as wrongful censoring, towards certain populations. In this paper, we address the discrimination against people with disabilities, PWD, done by sentiment analysis and toxicity classification models. We provide an examination of sentiment and toxicity analysis models to understand in detail how they discriminate PWD. We present the Bias Identification Test in Sentiments (BITS), a corpus of 1,126 sentences designed to probe sentiment analysis models for biases in disability. We use this corpus to demonstrate statistically significant biases in four widely used sentiment analysis tools (TextBlob, VADER, Google Cloud Natural Language API and DistilBERT) and two toxicity analysis models trained to predict toxic comments on Jigsaw challenges (Toxic comment classification and Unintended Bias in Toxic comments). The results show that all exhibit strong negative biases on sentences that mention disability. We publicly release BITS Corpus for others to identify potential biases against disability in any sentiment analysis tools and also to update the corpus to be used as a test for other sociodemographic variables as well.
Social media, as a means for computer-mediated communication, has been extensively used to study the sentiment expressed by users around events or topics. There is however a gap in the longitudinal study of how sentiment evolved in social media over the years. To fill this gap, we develop TM-Senti, a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets and covering a time period of over seven years. We describe and assess our methodology to put together a large-scale, emoticon- and emoji-based labelled sentiment analysis dataset, along with an analysis of the resulting dataset. Our analysis highlights interesting temporal changes, among others in the increasing use of emojis over emoticons. We publicly release the dataset for further research in tasks including sentiment analysis and text classification of tweets. The dataset can be fully rehydrated including tweet metadata and without missing tweets thanks to the archive of tweets publicly available on the Internet Archive, which the dataset is based on.
The increasing popularity of social networks and users' tendency towards sharing their feelings, expressions, and opinions in text, visual, and audio content, have opened new opportunities and challenges in sentiment analysis. While sentiment analysis of text streams has been widely explored in literature, sentiment analysis from images and videos is relatively new. This article focuses on visual sentiment analysis in a societal important domain, namely disaster analysis in social media. To this aim, we propose a deep visual sentiment analyzer for disaster related images, covering different aspects of visual sentiment analysis starting from data collection, annotation, model selection, implementation, and evaluations. For data annotation, and analyzing peoples' sentiments towards natural disasters and associated images in social media, a crowd-sourcing study has been conducted with a large number of participants worldwide. The crowd-sourcing study resulted in a large-scale benchmark dataset with four different sets of annotations, each aiming a separate task. The presented analysis and the associated dataset will provide a baseline/benchmark for future research in the domain. We believe the proposed system can contribute toward more livable communities by helping different stakeholders, such as news broadcasters, humanitarian organizations, as well as the general public.
Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using images and videos to express their opinions and share their experiences. Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis. Motivated by the needs in leveraging large scale yet noisy training data to solve the extremely challenging problem of image sentiment analysis, we employ Convolutional Neural Networks (CNN). We first design a suitable CNN architecture for image sentiment analysis. We obtain half a million training samples by using a baseline sentiment algorithm to label Flickr images. To make use of such noisy machine labeled data, we employ a progressive strategy to fine-tune the deep network. Furthermore, we improve the performance on Twitter images by inducing domain transfer with a small number of manually labeled Twitter images. We have conducted extensive experiments on manually labeled Twitter images. The results show that the proposed CNN can achieve better performance in image sentiment analysis than competing algorithms.
Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on. Although textual sentiment analysis has been well studied based on platforms such as Twitter and Instagram, analysis of the role of extensive emoji uses in sentiment analysis remains light. In this paper, we propose a novel scheme for Twitter sentiment analysis with extra attention on emojis. We first learn bi-sense emoji embeddings under positive and negative sentimental tweets individually, and then train a sentiment classifier by attending on these bi-sense emoji embeddings with an attention-based long short-term memory network (LSTM). Our experiments show that the bi-sense embedding is effective for extracting sentiment-aware embeddings of emojis and outperforms the state-of-the-art models. We also visualize the attentions to show that the bi-sense emoji embedding provides better guidance on the attention mechanism to obtain a more robust understanding of the semantics and sentiments.
How could a product or service is reasonably evaluated by anyone in the shortest time? A million dollar question but it is having a simple answer: Sentiment analysis. Sentiment analysis is consumers review on products and services which helps both the producers and consumers (stakeholders) to take effective and efficient decision within a shortest period of time. Producers can have better knowledge of their products and services through the sentiment analysis (ex. positive and negative comments or consumers likes and dislikes) which will help them to know their products status (ex. product limitations or market status). Consumers can have better knowledge of their interested products and services through the sentiment analysis (ex. positive and negative comments or consumers likes and dislikes) which will help them to know their deserving products status (ex. product limitations or market status). For more specification of the sentiment values, fuzzy logic could be introduced. Therefore, sentiment analysis with the help of fuzzy logic (deals with reasoning and gives closer views to the exact sentiment values) will help the producers or consumers or any interested person for taking the effective decision according to their product or service interest.
With the increasing popularity of video sharing websites such as YouTube and Facebook, multimodal sentiment analysis has received increasing attention from the scientific community. Contrary to previous works in multimodal sentiment analysis which focus on holistic information in speech segments such as bag of words representations and average facial expression intensity, we develop a novel deep architecture for multimodal sentiment analysis that performs modality fusion at the word level. In this paper, we propose the Gated Multimodal Embedding LSTM with Temporal Attention (GME-LSTM(A)) model that is composed of 2 modules. The Gated Multimodal Embedding alleviates the difficulties of fusion when there are noisy modalities. The LSTM with Temporal Attention performs word level fusion at a finer fusion resolution between input modalities and attends to the most important time steps. As a result, the GME-LSTM(A) is able to better model the multimodal structure of speech through time and perform better sentiment comprehension. We demonstrate the effectiveness of this approach on the publicly-available Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis (CMU-MOSI) dataset by achieving state-of-the-art sentiment classification and regression results. Qualitative analysis on our model emphasizes the importance of the Temporal Attention Layer in sentiment prediction because the additional acoustic and visual modalities are noisy. We also demonstrate the effectiveness of the Gated Multimodal Embedding in selectively filtering these noisy modalities out. Our results and analysis open new areas in the study of sentiment analysis in human communication and provide new models for multimodal fusion.
Sentiment analysis is one of the fastest growing research areas in computer science, making it challenging to keep track of all the activities in the area. We present a computer-assisted literature review, where we utilize both text mining and qualitative coding, and analyze 6,996 papers from Scopus. We find that the roots of sentiment analysis are in the studies on public opinion analysis at the beginning of 20th century and in the text subjectivity analysis performed by the computational linguistics community in 1990's. However, the outbreak of computer-based sentiment analysis only occurred with the availability of subjective texts on the Web. Consequently, 99% of the papers have been published after 2004. Sentiment analysis papers are scattered to multiple publication venues, and the combined number of papers in the top-15 venues only represent ca. 30% of the papers in total. We present the top-20 cited papers from Google Scholar and Scopus and a taxonomy of research topics. In recent years, sentiment analysis has shifted from analyzing online product reviews to social media texts from Twitter and Facebook. Many topics beyond product reviews like stock markets, elections, disasters, medicine, software engineering and cyberbullying extend the utilization of sentiment analysis
Sentiment analysis in conversations has gained increasing attention in recent years for the growing amount of applications it can serve, e.g., sentiment analysis, recommender systems, and human-robot interaction. The main difference between conversational sentiment analysis and single sentence sentiment analysis is the existence of context information which may influence the sentiment of an utterance in a dialogue. How to effectively encode contextual information in dialogues, however, remains a challenge. Existing approaches employ complicated deep learning structures to distinguish different parties in a conversation and then model the context information. In this paper, we propose a fast, compact and parameter-efficient party-ignorant framework named bidirectional emotional recurrent unit for conversational sentiment analysis. In our system, a generalized neural tensor block followed by a two-channel classifier is designed to perform context compositionality and sentiment classification, respectively. Extensive experiments on three standard datasets demonstrate that our model outperforms the state of the art in most cases.
Sentiment analysis of social media data consists of attitudes, assessments, and emotions which can be considered a way human think. Understanding and classifying the large collection of documents into positive and negative aspects are a very difficult task. Social networks such as Twitter, Facebook, and Instagram provide a platform in order to gather information about peoples sentiments and opinions. Considering the fact that people spend hours daily on social media and share their opinion on various different topics helps us analyze sentiments better. More and more companies are using social media tools to provide various services and interact with customers. Sentiment Analysis (SA) classifies the polarity of given tweets to positive and negative tweets in order to understand the sentiments of the public. This paper aims to perform sentiment analysis of real-time 2019 election twitter data using the feature selection model word2vec and the machine learning algorithm random forest for sentiment classification. Word2vec with Random Forest improves the accuracy of sentiment analysis significantly compared to traditional methods such as BOW and TF-IDF. Word2vec improves the quality of features by considering contextual semantics of words in a text hence improving the accuracy of machine learning and sentiment analysis.