Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Sentence-Level Sentiment Analysis of Financial News Using Distributed Text Representations and Multi-Instance Learning

Dec 31, 2018
Bernhard Lutz, Nicolas Pröllochs, Dirk Neumann

Researchers and financial professionals require robust computerized tools that allow users to rapidly operationalize and assess the semantic textual content in financial news. However, existing methods commonly work at the document-level while deeper insights into the actual structure and the sentiment of individual sentences remain blurred. As a result, investors are required to apply the utmost attention and detailed, domain-specific knowledge in order to assess the information on a fine-grained basis. To facilitate this manual process, this paper proposes the use of distributed text representations and multi-instance learning to transfer information from the document-level to the sentence-level. Compared to alternative approaches, this method features superior predictive performance while preserving context and interpretability. Our analysis of a manually-labeled dataset yields a predictive accuracy of up to 69.90%, exceeding the performance of alternative approaches by at least 3.80 percentage points. Accordingly, this study not only benefits investors with regard to their financial decision-making, but also helps companies to communicate their messages as intended.

  Access Paper or Ask Questions

Detecting Group Beliefs Related to 2018's Brazilian Elections in Tweets A Combined Study on Modeling Topics and Sentiment Analysis

May 31, 2020
Brenda Salenave Santana, Aline Aver Vanin

2018's Brazilian presidential elections highlighted the influence of alternative media and social networks, such as Twitter. In this work, we perform an analysis covering politically motivated discourses related to the second round in Brazilian elections. In order to verify whether similar discourses reinforce group engagement to personal beliefs, we collected a set of tweets related to political hashtags at that moment. To this end, we have used a combination of topic modeling approach with opinion mining techniques to analyze the motivated political discourses. Using SentiLex-PT, a Portuguese sentiment lexicon, we extracted from the dataset the top 5 most frequent group of words related to opinions. Applying a bag-of-words model, the cosine similarity calculation was performed between each opinion and the observed groups. This study allowed us to observe an exacerbated use of passionate discourses in the digital political scenario as a form of appreciation and engagement to the groups which convey similar beliefs.

* Proceedings of the Workshop on Digital Humanities and Natural Language Processing (DHandNLP 2020) co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2020) 

  Access Paper or Ask Questions

Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis

Aug 06, 2018
Andrew Moore, Paul Rayson

Lack of repeatability and generalisability are two significant threats to continuing scientific development in Natural Language Processing. Language models and learning methods are so complex that scientific conference papers no longer contain enough space for the technical depth required for replication or reproduction. Taking Target Dependent Sentiment Analysis as a case study, we show how recent work in the field has not consistently released code, or described settings for learning methods in enough detail, and lacks comparability and generalisability in train, test or validation data. To investigate generalisability and to enable state of the art comparative evaluations, we carry out the first reproduction studies of three groups of complementary methods and perform the first large-scale mass evaluation on six different English datasets. Reflecting on our experiences, we recommend that future replication or reproduction experiments should always consider a variety of datasets alongside documenting and releasing their methods and published code in order to minimise the barriers to both repeatability and generalisability. We have released our code with a model zoo on GitHub with Jupyter Notebooks to aid understanding and full documentation, and we recommend that others do the same with their papers at submission time through an anonymised GitHub account.

* COLING 2018. Code available at: 

  Access Paper or Ask Questions

Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature

Jan 12, 2022
Arthur M. Jacobs, Annette Kinder

The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. In this study we address differences among the different literature categories in GLEC, as well as differences between authors. We report the results of three studies providing i) topic and sentiment analyses for six text categories of GLEC (i.e., children and youth, essays, novels, plays, poems, stories) and its >100 authors, ii) novel measures of semantic complexity as indices of the literariness, creativity and book beauty of the works in GLEC (e.g., Jane Austen's six novels), and iii) two experiments on text classification and authorship recognition using novel features of semantic complexity. The data on two novel measures estimating a text's literariness, intratextual variance and stepwise distance (van Cranenburgh et al., 2019) revealed that plays are the most literary texts in GLEC, followed by poems and novels. Computation of a novel index of text creativity (Gray et al., 2016) revealed poems and plays as the most creative categories with the most creative authors all being poets (Milton, Pope, Keats, Byron, or Wordsworth). We also computed a novel index of perceived beauty of verbal art (Kintsch, 2012) for the works in GLEC and predict that Emma is the theoretically most beautiful of Austen's novels. Finally, we demonstrate that these novel measures of semantic complexity are important features for text classification and authorship recognition with overall predictive accuracies in the range of .75 to .97. Our data pave the way for future computational and empirical studies of literature or experiments in reading psychology and offer multiple baselines and benchmarks for analysing and validating other book corpora.

* 37 pages, 12 figures 

  Access Paper or Ask Questions

CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM

May 23, 2019
Rohit Gavval, Vadlamani Ravi, Kalavala Revanth Harshal, Akhilesh Gangwar, Kumar Ravi

With the widespread use of social media, companies now have access to a wealth of customer feedback data which has valuable applications to Customer Relationship Management (CRM). Analyzing customer grievances data, is paramount as their speedy non-redressal would lead to customer churn resulting in lower profitability. In this paper, we propose a descriptive analytics framework using Self-organizing feature map (SOM), for Visual Sentiment Analysis of customer complaints. The network learns the inherent grouping of the complaints automatically which can then be visualized too using various techniques. Analytical Customer Relationship Management (ACRM) executives can draw useful business insights from the maps and take timely remedial action. We also propose a high-performance version of the algorithm CUDASOM (CUDA based Self Organizing feature Map) implemented using NVIDIA parallel computing platform, CUDA, which speeds up the processing of high-dimensional text data and generates fast results. The efficacy of the proposed model has been demonstrated on the customer complaints data regarding the products and services of four leading Indian banks. CUDASOM achieved an average speed up of 44 times. Our approach can expand research into intelligent grievance redressal system to provide rapid solutions to the complaining customers.

  Access Paper or Ask Questions

Theedhum [email protected]: A Sentiment Polarity Classifier for YouTube Comments with Code-switching between Tamil, Malayalam and English

Oct 13, 2020
BalaSundaraRaman Lakshmanan, Sanjeeth Kumar Ravindranath

Theedhum Nandrum is a sentiment polarity detection system using two approaches--a Stochastic Gradient Descent (SGD) based classifier and a Long Short-term Memory (LSTM) based Classifier. Our approach utilises language features like use of emoji, choice of scripts and code mixing which appeared quite marked in the datasets specified for the Dravidian Codemix - FIRE 2020 task. The hyperparameters for the SGD were tuned using GridSearchCV. Our system was ranked 4th in Tamil-English with a weighted average F1 score of 0.62 and 9th in Malayalam-English with a score of 0.65. We achieved a weighted average F1 score of 0.77 for Tamil-English using a Logistic Regression based model after the task deadline. This performance betters the top ranked classifier on this dataset by a wide margin. Our use of language-specific Soundex to harmonise the spelling variants in code-mixed data appears to be a novel application of Soundex. Our complete code is published in github at

* FIRE 2020, December 16-20, 2020, Hyderabad, India 

  Access Paper or Ask Questions

Architecture of Text Mining Application in Analyzing Public Sentiments of West Java Governor Election using Naive Bayes Classification

Sep 20, 2018
Suryanto Nugroho, Prihandoko

The selection of West Java governor is one event that seizes the attention of the public is no exception to social media users. Public opinion on a prospective regional leader can help predict electability and tendency of voters. Data that can be used by the opinion mining process can be obtained from Twitter. Because the data is very varied form and very unstructured, it must be managed and uninformed using data pre-processing techniques into semi-structured data. This semi-structured information is followed by a classification stage to categorize the opinion into negative or positive opinions. The research methodology uses a literature study where the research will examine previous research on a similar topic. The purpose of this study is to find the right architecture to develop it into the application of twitter opinion mining to know public sentiments toward the election of the governor of west java. The result of this research is that Twitter opinion mining is part of text mining where opinions in Twitter if they want to be classified, must go through the preprocessing text stage first. The preprocessing step required from twitter data is cleansing, case folding, POS Tagging and stemming. The resulting text mining architecture is an architecture that can be used for text mining research with different topics.

* 5 Pages 

  Access Paper or Ask Questions

Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach

Apr 24, 2020
Hamed Jelodar, Yongli Wang, Rita Orji, Hucheng Huang

Internet forums and public social media, such as online healthcare forums, provide a convenient channel for users (people/patients) concerned about health issues to discuss and share information with each other. In late December 2019, an outbreak of a novel coronavirus (infection from which results in the disease named COVID-19) was reported, and, due to the rapid spread of the virus in other parts of the world, the World Health Organization declared a state of emergency. In this paper, we used automated extraction of COVID-19 related discussions from social media and a natural language process (NLP) method based on topic modeling to uncover various issues related to COVID-19 from public opinions. Moreover, we also investigate how to use LSTM recurrent neural network for sentiment classification of COVID-19 comments. Our findings shed light on the importance of using public opinions and suitable computational techniques to understand issues surrounding COVID-19 and to guide related decision-making.

  Access Paper or Ask Questions

UPB at SemEval-2020 Task 9: Identifying Sentiment in Code-Mixed Social Media Texts using Transformers and Multi-Task Learning

Sep 06, 2020
George-Eduard Zaharia, George-Alexandru Vlad, Dumitru-Clementin Cercel, Traian Rebedea, Costin-Gabriel Chiru

Sentiment analysis is a process widely used in opinion mining campaigns conducted today. This phenomenon presents applications in a variety of fields, especially in collecting information related to the attitude or satisfaction of users concerning a particular subject. However, the task of managing such a process becomes noticeably more difficult when it is applied in cultures that tend to combine two languages in order to express ideas and thoughts. By interleaving words from two languages, the user can express with ease, but at the cost of making the text far less intelligible for those who are not familiar with this technique, but also for standard opinion mining algorithms. In this paper, we describe the systems developed by our team for SemEval-2020 Task 9 that aims to cover two well-known code-mixed languages: Hindi-English and Spanish-English. We intend to solve this issue by introducing a solution that takes advantage of several neural network approaches, as well as pre-trained word embeddings. Our approach (multlingual BERT) achieves promising performance on the Hindi-English task, with an average F1-score of 0.6850, registered on the competition leaderboard, ranking our team 16th out of 62 participants. For the Spanish-English task, we obtained an average F1-score of 0.7064 ranking our team 17th out of 29 participants by using another multilingual Transformer-based model, XLM-RoBERTa.

* Accepted at SemEval-2020, 9 pages, 4 tables 

  Access Paper or Ask Questions

Comparison of SVM Optimization Techniques in the Primal

Jun 28, 2014
Jonathan Katzman, Diane Duros

This paper examines the efficacy of different optimization techniques in a primal formulation of a support vector machine (SVM). Three main techniques are compared. The dataset used to compare all three techniques was the Sentiment Analysis on Movie Reviews dataset, from

  Access Paper or Ask Questions