Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

A Computational Approach to Walt Whitman's Stylistic Changes in Leaves of Grass

Nov 09, 2021
Jieyan Zhu

This study analyzes Walt Whitman's stylistic changes in his phenomenal work Leaves of Grass from a computational perspective and relates findings to standard literary criticism on Whitman. The corpus consists of all 7 editions of Leaves of Grass, ranging from the earliest 1855 edition to the 1891-92 "deathbed" edition. Starting from counting word frequencies, the simplest stylometry technique, we find consistent shifts in word choice. Macro-etymological analysis reveals Whitman's increasing preference for words of specific origins, which is correlated to the increasing lexical complexity in Leaves of Grass. Principal component analysis, an unsupervised learning algorithm, reduces the dimensionality of tf-idf vectors to 2 dimensions, providing a straightforward view of stylistic changes. Finally, sentiment analysis shows the evolution of Whitman's emotional state throughout his writing career.

* 22 pages, 3 figures, 7 tables 

  Access Paper or Ask Questions

Leveraging Recursive Processing for Neural-Symbolic Affect-Target Associations

Mar 05, 2021
A. Sutherland, S. Magg, S. Wermter

Explaining the outcome of deep learning decisions based on affect is challenging but necessary if we expect social companion robots to interact with users on an emotional level. In this paper, we present a commonsense approach that utilizes an interpretable hybrid neural-symbolic system to associate extracted targets, noun chunks determined to be associated with the expressed emotion, with affective labels from a natural language expression. We leverage a pre-trained neural network that is well adapted to tree and sub-tree processing, the Dependency Tree-LSTM, to learn the affect labels of dynamic targets, determined through symbolic rules, in natural language. We find that making use of the unique properties of the recursive network provides higher accuracy and interpretability when compared to other unstructured and sequential methods for determining target-affect associations in an aspect-based sentiment analysis task.

* 6 pages, 5 figures 

  Access Paper or Ask Questions

A comparative study of Bot Detection techniques methods with an application related to Covid-19 discourse on Twitter

Feb 01, 2021
Marzia Antenore, Jose M. Camacho-Rodriguez, Emanuele Panizzi

Bot Detection is an essential asset in a period where Online Social Networks(OSN) is a part of our lives. This task becomes more relevant in crises, as the Covid-19 pandemic, where there is an incipient risk of proliferation of social bots, producing a possible source of misinformation. In order to address this issue, it has been compared different methods to detect automatically social bots on Twitter using Data Selection. The techniques utilized to elaborate the bot detection models include the utilization of features as the tweets metadata or the Digital Fingerprint of the Twitter accounts. In addition, it was analyzed the presence of bots in tweets from different periods of the first months of the Covid-19 pandemic, using the bot detection technique which best fits the scope of the task. Moreover, this work includes also analysis over aspects regarding the discourse of bots and humans, such as sentiment or hashtag utilization.

* 36 pages, 10 figures, 5 tables 

  Access Paper or Ask Questions

[email protected]:Identifying Offensive Language from ManglishTweets

Oct 17, 2020
Sara Renjit, Sumam Mary Idicula

With the popularity of social media, communications through blogs, Facebook, Twitter, and other plat-forms have increased. Initially, English was the only medium of communication. Fortunately, now we can communicate in any language. It has led to people using English and their own native or mother tongue language in a mixed form. Sometimes, comments in other languages have English transliterated format or other cases; people use the intended language scripts. Identifying sentiments and offensive content from such code mixed tweets is a necessary task in these times. We present a working model submitted for Task2 of the sub-track HASOC Offensive Language Identification- DravidianCodeMix in Forum for Information Retrieval Evaluation, 2020. It is a message level classification task. An embedding model-based classifier identifies offensive and not offensive comments in our approach. We applied this method in the Manglish dataset provided along with the sub-track.

  Access Paper or Ask Questions

Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks

Jul 19, 2020
Diego de Vargas Feijo, Viviane Pereira Moreira

BERT (Bidirectional Encoder Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models which can later be fine-tuned for a variety of Natural Language Understanding tasks. These methods have been applied to a number of such tasks (mostly in English), achieving results that outperform the state-of-the-art. In this paper, our contribution is twofold. First, we make available our trained BERT and Albert model for Portuguese. Second, we compare our monolingual and the standard multilingual models using experiments in semantic textual similarity, recognizing textual entailment, textual category classification, sentiment analysis, offensive comment detection, and fake news detection, to assess the effectiveness of the generated language representations. The results suggest that both monolingual and multilingual models are able to achieve state-of-the-art and the advantage of training a single language model, if any, is small.

  Access Paper or Ask Questions

Neural Architectures for Fine-Grained Propaganda Detection in News

Sep 13, 2019
Pankaj Gupta, Khushbu Saxena, Usama Yaseen, Thomas Runkler, Hinrich Schütze

This paper describes our system (MIC-CIS) details and results of participation in the fine-grained propaganda detection shared task 2019. To address the tasks of sentence (SLC) and fragment level (FLC) propaganda detection, we explore different neural architectures (e.g., CNN, LSTM-CRF and BERT) and extract linguistic (e.g., part-of-speech, named entity, readability, sentiment, emotion, etc.), layout and topical features. Specifically, we have designed multi-granularity and multi-tasking neural architectures to jointly perform both the sentence and fragment level propaganda detection. Additionally, we investigate different ensemble schemes such as majority-voting, relax-voting, etc. to boost overall system performance. Compared to the other participating systems, our submissions are ranked 3rd and 4th in FLC and SLC tasks, respectively.

* EMNLP2019: Fine-grained propaganda detection shared task at NLP4IF workshop (EMNLP2019) 

  Access Paper or Ask Questions

DocBERT: BERT for Document Classification

Apr 17, 2019
Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin

Pre-trained language representation models achieve remarkable state of the art across a wide range of tasks in natural language processing. One of the latest advancements is BERT, a deep pre-trained transformer that yields much better results than its predecessors do. Despite its burgeoning popularity, however, BERT has not yet been applied to document classification. This task deserves attention, since it contains a few nuances: first, modeling syntactic structure matters less for document classification than for other problems, such as natural language inference and sentiment classification. Second, documents often have multiple labels across dozens of classes, which is uncharacteristic of the tasks that BERT explores. In this paper, we describe fine-tuning BERT for document classification. We are the first to demonstrate the success of BERT on this task, achieving state of the art across four popular datasets.

* 5 pages, 2 figures. First two authors contributed equally 

  Access Paper or Ask Questions

Structured Content Preservation for Unsupervised Text Style Transfer

Oct 31, 2018
Youzhi Tian, Zhiting Hu, Zhou Yu

Text style transfer aims to modify the style of a sentence while keeping its content unchanged. Recent style transfer systems often fail to faithfully preserve the content after changing the style. This paper proposes a structured content preserving model that leverages linguistic information in the structured fine-grained supervisions to better preserve the style-independent content during style transfer. In particular, we achieve the goal by devising rich model objectives based on both the sentence's lexical information and a language model that conditions on content. The resulting model therefore is encouraged to retain the semantic meaning of the target sentences. We perform extensive experiments that compare our model to other existing approaches in the tasks of sentiment and political slant transfer. Our model achieves significant improvement in terms of both content preservation and style transfer in automatic and human evaluation.

  Access Paper or Ask Questions

Strong Baselines for Neural Semi-supervised Learning under Domain Shift

Apr 25, 2018
Sebastian Ruder, Barbara Plank

Novel neural models have been proposed in recent years for learning under domain shift. Most models, however, only evaluate on a single task, on proprietary datasets, or compare to weak baselines, which makes comparison of models difficult. In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training. Extensive experiments on two benchmarks are negative: while our novel method establishes a new state-of-the-art for sentiment analysis, it does not fare consistently the best. More importantly, we arrive at the somewhat surprising conclusion that classic tri-training, with some additions, outperforms the state of the art. We conclude that classic approaches constitute an important and strong baseline.

* ACL 2018 

  Access Paper or Ask Questions

Bayesian Paragraph Vectors

Dec 07, 2017
Geng Ji, Robert Bamler, Erik B. Sudderth, Stephan Mandt

Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014) find fixed-length representations for pieces of text with arbitrary lengths, such as documents, paragraphs, and sentences. In this work, we propose a novel interpretation for neural-network-based paragraph vectors by developing an unsupervised generative model whose maximum likelihood solution corresponds to traditional paragraph vectors. This probabilistic formulation allows us to go beyond point estimates of parameters and to perform Bayesian posterior inference. We find that the entropy of paragraph vectors decreases with the length of documents, and that information about posterior uncertainty improves performance in supervised learning tasks such as sentiment analysis and paraphrase detection.

* Presented at the NIPS 2017 workshop "Advances in Approximate Bayesian Inference" 

  Access Paper or Ask Questions