Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Using Nuances of Emotion to Identify Personality

Sep 24, 2013
Saif M. Mohammad, Svetlana Kiritchenko

Past work on personality detection has shown that frequency of lexical categories such as first person pronouns, past tense verbs, and sentiment words have significant correlations with personality traits. In this paper, for the first time, we show that fine affect (emotion) categories such as that of excitement, guilt, yearning, and admiration are significant indicators of personality. Additionally, we perform experiments to show that the gains provided by the fine affect categories are not obtained by using coarse affect categories alone or with specificity features alone. We employ these features in five SVM classifiers for detecting five personality traits through essays. We find that the use of fine emotion features leads to statistically significant improvement over a competitive baseline, whereas the use of coarse affect and specificity features does not.

* In Proceedings of the ICWSM Workshop on Computational Personality Recognition, July 2013, Boston, USA 

  Access Paper or Ask Questions

Prompt-based Pre-trained Model for Personality and Interpersonal Reactivity Prediction

Mar 23, 2022
Bin Li, Yixuan Weng, Qiya Song, Fuyan Ma, Bin Sun, Shutao Li

This paper describes the LingJing team's method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Personality Prediction (PER) and Reactivity Index Prediction (IRI). In this paper, we adopt the prompt-based method with the pre-trained language model to accomplish these tasks. Specifically, the prompt is designed to provide the extra knowledge for enhancing the pre-trained model. Data augmentation and model ensemble are adopted for obtaining better results. Extensive experiments are performed, which shows the effectiveness of the proposed method. On the final submission, our system achieves a Pearson Correlation Coefficient of 0.2301 and 0.2546 on Track 3 and Track 4 respectively. We ranked Top-1 on both sub-tasks.

* The shared task paper described the contributions of the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) @ ACL-2022 

  Access Paper or Ask Questions

Privacy enabled Financial Text Classification using Differential Privacy and Federated Learning

Oct 04, 2021
Priyam Basu, Tiasa Singha Roy, Rakshit Naidu, Zumrut Muftuoglu

Privacy is important considering the financial Domain as such data is highly confidential and sensitive. Natural Language Processing (NLP) techniques can be applied for text classification and entity detection purposes in financial domains such as customer feedback sentiment analysis, invoice entity detection, categorisation of financial documents by type etc. Due to the sensitive nature of such data, privacy measures need to be taken for handling and training large models with such data. In this work, we propose a contextualized transformer (BERT and RoBERTa) based text classification model integrated with privacy features such as Differential Privacy (DP) and Federated Learning (FL). We present how to privately train NLP models and desirable privacy-utility tradeoffs and evaluate them on the Financial Phrase Bank dataset.

* 4 pages. Accepted at ECONLP-EMNLP'21 

  Access Paper or Ask Questions

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Oct 26, 2020
Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

* Findings of EMNLP 2020. TweetEval benchmark available at 

  Access Paper or Ask Questions

Improving Indonesian Text Classification Using Multilingual Language Model

Sep 12, 2020
Ilham Firdausi Putra, Ayu Purwarianti

Compared to English, the amount of labeled data for Indonesian text classification tasks is very small. Recently developed multilingual language models have shown its ability to create multilingual representations effectively. This paper investigates the effect of combining English and Indonesian data on building Indonesian text classification (e.g., sentiment analysis and hate speech) using multilingual language models. Using the feature-based approach, we observe its performance on various data sizes and total added English data. The experiment showed that the addition of English data, especially if the amount of Indonesian data is small, improves performance. Using the fine-tuning approach, we further showed its effectiveness in utilizing the English language to build Indonesian text classification models.

* 2020 International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA) 

  Access Paper or Ask Questions

AraNet: A Deep Learning Toolkit for Arabic Social Media

Dec 30, 2019
Muhammad Abdul-Mageed, Chiyu Zhang, Azadeh Hashemi, El Moatez Billah Nagoudi

We describe AraNet, a collection of deep learning Arabic social media processing tools. Namely, we exploit an extensive host of publicly available and novel social media datasets to train bidirectional encoders from transformer models (BERT) to predict age, dialect, gender, emotion, irony, and sentiment. AraNet delivers state-of-the-art performance on a number of the cited tasks and competitively on others. In addition, AraNet has the advantage of being exclusively based on a deep learning framework and hence feature engineering free. To the best of our knowledge, AraNet is the first to performs predictions across such a wide range of tasks for Arabic NLP and thus meets a critical needs. We publicly release AraNet to accelerate research and facilitate comparisons across the different tasks.

  Access Paper or Ask Questions

BERTje: A Dutch BERT Model

Dec 19, 2019
Wietse de Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso Caselli, Gertjan van Noord, Malvina Nissim

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, BERTje is based on a large and diverse dataset of 2.4 billion tokens. BERTje consistently outperforms the equally-sized multilingual BERT model on downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic role labeling, and sentiment analysis). Our pre-trained Dutch BERT model is made available at

  Access Paper or Ask Questions

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Sep 17, 2019
Prashanth Vijayaraghavan, Deb Roy

Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG's news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text.

* 16 pages, 3 figures, ECML PKDD 2019 

  Access Paper or Ask Questions

Examining Structure of Word Embeddings with PCA

May 31, 2019
Tomáš Musil

In this paper we compare structure of Czech word embeddings for English-Czech neural machine translation (NMT), word2vec and sentiment analysis. We show that although it is possible to successfully predict part of speech (POS) tags from word embeddings of word2vec and various translation models, not all of the embedding spaces show the same structure. The information about POS is present in word2vec embeddings, but the high degree of organization by POS in the NMT decoder suggests that this information is more important for machine translation and therefore the NMT model represents it in more direct way. Our method is based on correlation of principal component analysis (PCA) dimensions with categorical linguistic data. We also show that further examining histograms of classes along the principal component is important to understand the structure of representation of information in embeddings.

* 12 pages, 6 figures, accepted to The 22th International Conference of Text, Speech and Dialogue (TSD2019) in Ljubljana 

  Access Paper or Ask Questions

ISNA-Set: A novel English Corpus of Iran NEWS

Aug 21, 2018
Mohammad Kamel, Hadi Sadoghi-Yazdi

News agencies publish news on their websites all over the world. Moreover, creating novel corpuses is necessary to bring natural processing to new domains. Textual processing of online news is challenging in terms of the strategy of collecting data, the complex structure of news websites, and selecting or designing suitable algorithms for processing these types of data. Despite the previous works which focus on creating corpuses for Iran news in Persian, in this paper, we introduce a new corpus for English news of a national news agency. ISNA-Set is a new dataset of English news of Iranian Students News Agency (ISNA), as one of the most famous news agencies in Iran. We statistically analyze the data and the sentiments of news, and also extract entities and part-of-speech tagging.

  Access Paper or Ask Questions