Citation sentimet analysis is one of the little studied tasks for scientometric analysis. For citation analysis, we developed eight datasets comprising citation sentences, which are manually annotated by us into three sentiment polarities viz. positive, negative, and neutral. Among eight datasets, three were developed by considering the whole context of citations. Furthermore, we proposed an ensembled feature engineering method comprising word embeddings obtained for texts, parts-of-speech tags, and dependency relationships together. Ensembled features were considered as input to deep learning based approaches for citation sentiment classification, which is in turn compared with Bag-of-Words approach. Experimental results demonstrate that deep learning is useful for higher number of samples, whereas support vector machine is the winner for smaller number of samples. Moreover, context-based samples are proved to be more effective than context-less samples for citation sentiment analysis.
Discourse structure is the hidden link between surface features and document-level properties, such as sentiment polarity. We show that the discourse analyses produced by Rhetorical Structure Theory (RST) parsers can improve document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that reweighting discourse units according to their position in a dependency representation of the rhetorical structure can yield substantial improvements on lexicon-based sentiment analysis. Next, we present a recursive neural network over the RST structure, which offers significant improvements over classification-based methods.
Targeted sentiment analysis (TSA), also known as aspect based sentiment analysis (ABSA), aims at detecting fine-grained sentiment polarity towards targets in a given opinion document. Due to the lack of labeled datasets and effective technology, TSA had been intractable for many years. The newly released datasets and the rapid development of deep learning technologies are key enablers for the recent significant progress made in this area. However, the TSA tasks have been defined in various ways with different understandings towards basic concepts like `target' and `aspect'. In this paper, we categorize the different tasks and highlight the differences in the available datasets and their specific tasks. We then further discuss the challenges related to data collection and data annotation which are overlooked in many previous studies.
Aspect-based sentiment analysis (ABSA) typically requires in-domain annotated data for supervised training/fine-tuning. It is a big challenge to scale ABSA to a large number of new domains. This paper aims to train a unified model that can perform zero-shot ABSA without using any annotated data for a new domain. We propose a method called contrastive post-training on review Natural Language Inference (CORN). Later ABSA tasks can be cast into NLI for zero-shot transfer. We evaluate CORN on ABSA tasks, ranging from aspect extraction (AE), aspect sentiment classification (ASC), to end-to-end aspect-based sentiment analysis (E2E ABSA), which show ABSA can be conducted without any human annotated ABSA data.
Targeted sentiment analysis is the task of jointly predicting target entities and their associated sentiment information. Existing research efforts mostly regard this joint task as a sequence labeling problem, building models that can capture explicit structures in the output space. However, the importance of capturing implicit global structural information that resides in the input space is largely unexplored. In this work, we argue that both types of information (implicit and explicit structural information) are crucial for building a successful targeted sentiment analysis model. Our experimental results show that properly capturing both information is able to lead to better performance than competitive existing approaches. We also conduct extensive experiments to investigate our model's effectiveness and robustness.
Internet and the proliferation of smart mobile devices have changed the way information is created, shared, and spreads, e.g., microblogs such as Twitter, weblogs such as LiveJournal, social networks such as Facebook, and instant messengers such as Skype and WhatsApp are now commonly used to share thoughts and opinions about anything in the surrounding world. This has resulted in the proliferation of social media content, thus creating new opportunities to study public opinion at a scale that was never possible before. Naturally, this abundance of data has quickly attracted business and research interest from various fields including marketing, political science, and social studies, among many others, which are interested in questions like these: Do people like the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about the Brexit? Answering these questions requires studying the sentiment of opinions people express in social media, which has given rise to the fast growth of the field of sentiment analysis in social media, with Twitter being especially popular for research due to its scale, representativeness, variety of topics discussed, as well as ease of public access to its messages. Here we present an overview of work on sentiment analysis on Twitter.
Sentiment analysis, also called opinion mining, is the field of study that analyzes people's opinions,sentiments, attitudes and emotions. Songs are important to sentiment analysis since the songs and mood are mutually dependent on each other. Based on the selected song it becomes easy to find the mood of the listener, in future it can be used for recommendation. The song lyric is a rich source of datasets containing words that are helpful in analysis and classification of sentiments generated from it. Now a days we observe a lot of inter-sentential and intra-sentential code-mixing in songs which has a varying impact on audience. To study this impact we created a Telugu songs dataset which contained both Telugu-English code-mixed and pure Telugu songs. In this paper, we classify the songs based on its arousal as exciting or non-exciting. We develop a language identification tool and introduce code-mixing features obtained from it as additional features. Our system with these additional features attains 4-5% accuracy greater than traditional approaches on our dataset.
A movie that is thoroughly enjoyed and recommended by an individual might be hated by another. One characteristic of humans is the ability to have feelings which could be positive or negative. To automatically classify and study human feelings, an aspect of natural language processing, sentiment analysis and opinion mining were designed to understand human feelings regarding several issues which could affect a product, a social media platforms, government, or societal discussions or even movies. Several works on sentiment analysis have been done on high resource languages while low resources languages like Yoruba have been sidelined. Due to the scarcity of datasets and linguistic architectures that will suit low resource languages, African languages "low resource languages" have been ignored and not fully explored. For this reason, our attention is placed on Yoruba to explore sentiment analysis on reviews of Nigerian movies. The data comprised 1500 movie reviews that were sourced from IMDB, Rotten Tomatoes, Letterboxd, Cinemapointer and Nollyrated. We develop sentiment classification models using the state-of-the-art pre-trained language models like mBERT and AfriBERTa to classify the movie reviews.
Data-driven statistical Natural Language Processing (NLP) techniques leverage large amounts of language data to build models that can understand language. However, most language data reflect the public discourse at the time the data was produced, and hence NLP models are susceptible to learning incidental associations around named referents at a particular point in time, in addition to general linguistic meaning. An NLP system designed to model notions such as sentiment and toxicity should ideally produce scores that are independent of the identity of such entities mentioned in text and their social associations. For example, in a general purpose sentiment analysis system, a phrase such as I hate Katy Perry should be interpreted as having the same sentiment as I hate Taylor Swift. Based on this idea, we propose a generic evaluation framework, Perturbation Sensitivity Analysis, which detects unintended model biases related to named entities, and requires no new annotations or corpora. We demonstrate the utility of this analysis by employing it on two different NLP models --- a sentiment model and a toxicity model --- applied on online comments in English language from four different genres.