Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Sentiment Analysis of Czech Texts: An Algorithmic Survey

Jan 09, 2019
Erion Çano, Ondřej Bojar

In the area of online communication, commerce and transactions, analyzing sentiment polarity of texts written in various natural languages has become crucial. While there have been a lot of contributions in resources and studies for the English language, "smaller" languages like Czech have not received much attention. In this survey, we explore the effectiveness of many existing machine learning algorithms for sentiment analysis of Czech Facebook posts and product reviews. We report the sets of optimal parameter values for each algorithm and the scores in both datasets. We finally observe that support vector machines are the best classifier and efforts to increase performance even more with bagging, boosting or voting ensemble schemes fail to do so.

* 7 pages, 2 figures, 7 tables. To appear in: 11th International Conference on Agents and Artificial Intelligence - ICAART 2019 

  Access Paper or Ask Questions

On the logistical difficulties and findings of Jopara Sentiment Analysis

May 11, 2021
Marvin M. Agüero-Torales, David Vilares, Antonio G. López-Herrera

This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.

* Proceedings on CALCS 2021 (co-located with NAACL 2021) - Fifth Workshop on Computational Approaches to Linguistic Code Switching 
* Accepted in the CALCS 2021 (co-located with NAACL 2021) - Fifth Workshop on Computational Approaches to Linguistic Code Switching, to appear (June 2021) 

  Access Paper or Ask Questions

Migration and Refugee Crisis: a Critical Analysis of Online Public Perception

Jul 20, 2020
Isa Inuwa-Dutse, Mark Liptrott, Ioannis Korkontzelos

The migration rate and the level of resentments towards migrants are an important issue in modern civilisation. The infamous EU refugee crisis caught many countries unprepared, leading to sporadic and rudimentary containment measures that, in turn, led to significant public discourse. Decades of offline data collected via traditional survey methods have been utilised earlier to understand public opinion to foster peaceful coexistence. Capturing and understanding online public opinion via social media is crucial towards a joint strategic regulation spanning safety, rights of migrants and cordial integration for economic prosperity. We present a analysis of opinions on migrants and refugees expressed by the users of a very popular social platform, Twitter. We analyse sentiment and the associated context of expressions in a vast collection of tweets related to the EU refugee crisis. Our study reveals a marginally higher proportion of negative sentiments vis-a-vis migrants and a large proportion of the negative sentiments is more reflected among the ordinary users. Users with many followers and non-governmental organisations (NGO) tend to tweet favourably about the topic, offsetting the distribution of negative sentiment. We opine that they can be encouraged to be more proactive in neutralising negative attitudes that may arise concerning similar incidences.

* 15 pages, 8 figures 

  Access Paper or Ask Questions

DynaSent: A Dynamic Benchmark for Sentiment Analysis

Dec 30, 2020
Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela

We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers, and its development and test splits are designed to produce chance performance for even the best models we have been able to develop; when future models solve this task, we will use them to create DynaSent version 2, continuing the dynamic evolution of this benchmark. Here, we report on the dataset creation effort, focusing on the steps we took to increase quality and reduce artifacts. We also present evidence that DynaSent's Neutral category is more coherent than the comparable category in other benchmarks, and we motivate training models from scratch for each round over successive fine-tuning.

  Access Paper or Ask Questions

Efficient text generation of user-defined topic using generative adversarial networks

Jun 22, 2020
Chenhan Yuan, Yi-chin Huang, Cheng-Hung Tsai

This study focused on efficient text generation using generative adversarial networks (GAN). Assuming that the goal is to generate a paragraph of a user-defined topic and sentimental tendency, conventionally the whole network has to be re-trained to obtain new results each time when a user changes the topic. This would be time-consuming and impractical. Therefore, we propose a User-Defined GAN (UD-GAN) with two-level discriminators to solve this problem. The first discriminator aims to guide the generator to learn paragraph-level information and sentence syntactic structure, which is constructed by multiple-LSTMs. The second one copes with higher-level information, such as the user-defined sentiment and topic for text generation. The cosine similarity based on TF-IDF and length penalty are adopted to determine the relevance of the topic. Then, the second discriminator is re-trained with the generator if the topic or sentiment for text generation is modified. The system evaluations are conducted to compare the performance of the proposed method with other GAN-based ones. The objective results showed that the proposed method is capable of generating texts with less time than others and the generated text is related to the user-defined topic and sentiment. We will further investigate the possibility of incorporating more detailed paragraph information such as semantics into text generation to enhance the result.

* Accepted by CC-NLG 2019 (Workshop on Computational Creativity in Natural Language Generation) 

  Access Paper or Ask Questions

TransRev: Modeling Reviews as Translations from Users to Items

Apr 18, 2018
Alberto Garcia-Duran, Roberto Gonzalez, Daniel Onoro-Rubio, Mathias Niepert, Hui Li

The text of a review expresses the sentiment a customer has towards a particular product. This is exploited in sentiment analysis where machine learning models are used to predict the review score from the text of the review. Furthermore, the products costumers have purchased in the past are indicative of the products they will purchase in the future. This is what recommender systems exploit by learning models from purchase information to predict the items a customer might be interested in. We propose TransRev, an approach to the product recommendation problem that integrates ideas from recommender systems, sentiment analysis, and multi-relational learning into a joint learning objective. TransRev learns vector representations for users, items, and reviews. The embedding of a review is learned such that (a) it performs well as input feature of a regression model for sentiment prediction; and (b) it always translates the reviewer embedding to the embedding of the reviewed items. This allows TransRev to approximate a review embedding at test time as the difference of the embedding of each item and the user embedding. The approximated review embedding is then used with the regression model to predict the review score for each item. TransRev outperforms state of the art recommender systems on a large number of benchmark data sets. Moreover, it is able to retrieve, for each user and item, the review text from the training set whose embedding is most similar to the approximated review embedding.

  Access Paper or Ask Questions

Optimization Of Cross Domain Sentiment Analysis Using Sentiwordnet

Oct 08, 2013
K Paramesha, K C Ravishankar

The task of sentiment analysis of reviews is carried out using manually built / automatically generated lexicon resources of their own with which terms are matched with lexicon to compute the term count for positive and negative polarity. On the other hand the Sentiwordnet, which is quite different from other lexicon resources that gives scores (weights) of the positive and negative polarity for each word. The polarity of a word namely positive, negative and neutral have the score ranging between 0 to 1 indicates the strength/weight of the word with that sentiment orientation. In this paper, we show that using the Sentiwordnet, how we could enhance the performance of the classification at both sentence and document level.

* International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.5, September 2013 

  Access Paper or Ask Questions

Assessment of Massively Multilingual Sentiment Classifiers

Apr 11, 2022
Krzysztof Rajda, Łukasz Augustyniak, Piotr Gramacki, Marcin Gruza, Szymon Woźniak, Tomasz Kajdanowicz

Models are increasing in size and complexity in the hunt for SOTA. But what if those 2\% increase in performance does not make a difference in a production use case? Maybe benefits from a smaller, faster model outweigh those slight performance gains. Also, equally good performance across languages in multilingual tasks is more important than SOTA results on a single one. We present the biggest, unified, multilingual collection of sentiment analysis datasets. We use these to assess 11 models and 80 high-quality sentiment datasets (out of 342 raw datasets collected) in 27 languages and included results on the internally annotated datasets. We deeply evaluate multiple setups, including fine-tuning transformer-based models for measuring performance. We compare results in numerous dimensions addressing the imbalance in both languages coverage and dataset sizes. Finally, we present some best practices for working with such a massive collection of datasets and models from a multilingual perspective.

* Accepted for WASSA at ACL 2022 

  Access Paper or Ask Questions

Decoding Sentiment from Distributed Representations of Sentences

Jun 16, 2017
Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Distributed representations of sentences have been developed recently to represent their meaning as real-valued vectors. However, it is not clear how much information such representations retain about the polarity of sentences. To study this question, we decode sentiment from unsupervised sentence representations learned with different architectures (sensitive to the order of words, the order of sentences, or none) in 9 typologically diverse languages. Sentiment results from the (recursive) composition of lexical items and grammatical strategies such as negation and concession. The results are manifold: we show that there is no `one-size-fits-all' representation architecture outperforming the others across the board. Rather, the top-ranking architectures depend on the language and data at hand. Moreover, we find that in several cases the additive composition model based on skip-gram word vectors may surpass supervised state-of-art architectures such as bidirectional LSTMs. Finally, we provide a possible explanation of the observed variation based on the type of negative constructions in each language.

  Access Paper or Ask Questions