Sentiment tasks such as hate speech detection and sentiment analysis, especially when performed on languages other than English, are often low-resource. In this study, we exploit the emotional information encoded in emojis to enhance the performance on a variety of sentiment tasks. This is done using a transfer learning approach, where the parameters learned by an emoji-based source task are transferred to a sentiment target task. We analyse the efficacy of the transfer under three conditions, i.e. i) the emoji content and ii) label distribution of the target task as well as iii) the difference between monolingually and multilingually learned source tasks. We find i.a. that the transfer is most beneficial if the target task is balanced with high emoji content. Monolingually learned source tasks have the benefit of taking into account the culturally specific use of emojis and gain up to F1 +0.280 over the baseline.
Aspect level sentiment classification aims to identify the sentiment expressed towards an aspect given a context sentence. Previous neural network based methods largely ignore the syntax structure in one sentence. In this paper, we propose a novel target-dependent graph attention network (TD-GAT) for aspect level sentiment classification, which explicitly utilizes the dependency relationship among words. Using the dependency graph, it propagates sentiment features directly from the syntactic context of an aspect target. In our experiments, we show our method outperforms multiple baselines with GloVe embeddings. We also demonstrate that using BERT representations further substantially boosts the performance.
Sentiment classification has been crucial for many natural language processing (NLP) applications, such as the analysis of movie reviews, tweets, or customer feedback. A sufficiently large amount of data is required to build a robust sentiment classification system. However, such resources are not always available for all domains or for all languages. In this work, we propose employing a machine translation (MT) system to translate customer feedback into another language to investigate in which cases translated sentences can have a positive or negative impact on an automatic sentiment classifier. Furthermore, as performing a direct translation is not always possible, we explore the performance of automatic classifiers on sentences that have been translated using a pivot MT system. We conduct several experiments using the above approaches to analyse the performance of our proposed sentiment classification system and discuss the advantages and drawbacks of classifying translated sentences.
The volume of discussions concerning brands within social media provides digital marketers with great opportunities for tracking and analyzing the feelings and views of consumers toward brands, products, influencers, services, and ad campaigns in CGC. The present study aims to assess and compare the performance of firms and celebrities (i.e., influencers that with the experience of being in an ad campaign of those companies) with the automated sentiment analysis that was employed for CGC at social media while exploring the feeling of the consumers toward them to observe which influencer (of two for each company) had a closer effect with the corresponding corporation on consumer minds. For this purpose, several consumer tweets from the pages of brands and influencers were utilized to make a comparison of machine learning and lexicon-based approaches to the sentiment analysis through the Naive algorithm (lexicon-based) and Naive Bayes algorithm (machine learning method) and obtain the desired results to assess the campaigns. The findings suggested that the approaches were dissimilar in terms of accuracy; the machine learning method yielded higher accuracy. Finally, the results showed which influencer was more appropriate according to their existence in previous campaigns and helped choose the right influencer in the future for our company and have a better, more appropriate, and more efficient ad campaign subsequently. It is required to conduct further studies on the accuracy improvement of the sentiment classification. This approach should be employed for other social media CGC types. The results revealed decision-making for which sentiment analysis methods are the best approaches for the analysis of social media. It was also found that companies should be aware of their consumers' sentiments and choose the right person every time they think of a campaign.
With the advent of word embeddings, lexicons are no longer fully utilized for sentiment analysis although they still provide important features in the traditional setting. This paper introduces a novel approach to sentiment analysis that integrates lexicon embeddings and an attention mechanism into Convolutional Neural Networks. Our approach performs separate convolutions for word and lexicon embeddings and provides a global view of the document using attention. Our models are experimented on both the SemEval'16 Task 4 dataset and the Stanford Sentiment Treebank, and show comparative or better results against the existing state-of-the-art systems. Our analysis shows that lexicon embeddings allow to build high-performing models with much smaller word embeddings, and the attention mechanism effectively dims out noisy words for sentiment analysis.
Since previous studies on open-domain targeted sentiment analysis are limited in dataset domain variety and sentence level, we propose a novel dataset consisting of 6,013 human-labeled data to extend the data domains in topics of interest and document level. Furthermore, we offer a nested target annotation schema to extract the complete sentiment information in documents, boosting the practicality and effectiveness of open-domain targeted sentiment analysis. Moreover, we leverage the pre-trained model BART in a sequence-to-sequence generation method for the task. Benchmark results show that there exists large room for improvement of open-domain targeted sentiment analysis. Meanwhile, experiments have shown that challenges remain in the effective use of open-domain data, long documents, the complexity of target structure, and domain variances.
In this paper, we explore the usability of different natural language processing models for the sentiment analysis of social media applied to financial market prediction, using the cryptocurrency domain as a reference. We study how the different sentiment metrics are correlated with the price movements of Bitcoin. For this purpose, we explore different methods to calculate the sentiment metrics from a text finding most of them not very accurate for this prediction task. We find that one of the models outperforms more than 20 other public ones and makes it possible to fine-tune it efficiently given its interpretable nature. Thus we confirm that interpretable artificial intelligence and natural language processing methods might be more valuable practically than non-explainable and non-interpretable ones. In the end, we analyse potential causal connections between the different sentiment metrics and the price movements.
Human communication is inherently multimodal and asynchronous. Analyzing human emotions and sentiment is an emerging field of artificial intelligence. We are witnessing an increasing amount of multimodal content in local languages on social media about products and other topics. However, there are not many multimodal resources available for under-resourced Dravidian languages. Our study aims to create a multimodal sentiment analysis dataset for the under-resourced Tamil and Malayalam languages. First, we downloaded product or movies review videos from YouTube for Tamil and Malayalam. Next, we created captions for the videos with the help of annotators. Then we labelled the videos for sentiment, and verified the inter-annotator agreement using Fleiss's Kappa. This is the first multimodal sentiment analysis dataset for Tamil and Malayalam by volunteer annotators.
We present a text-based framework for investigating moral sentiment change of the public via longitudinal corpora. Our framework is based on the premise that language use can inform people's moral perception toward right or wrong, and we build our methodology by exploring moral biases learned from diachronic word embeddings. We demonstrate how a parameter-free model supports inference of historical shifts in moral sentiment toward concepts such as slavery and democracy over centuries at three incremental levels: moral relevance, moral polarity, and fine-grained moral dimensions. We apply this methodology to visualizing moral time courses of individual concepts and analyzing the relations between psycholinguistic variables and rates of moral sentiment change at scale. Our work offers opportunities for applying natural language processing toward characterizing moral sentiment change in society.
The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.