Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

TextDecepter: Hard Label Black Box Attack on Text Classifiers

Aug 16, 2020
Sachin Saxena

Machine learning has been proven to be susceptible to carefully crafted samples, known as adversarialexamples. The generation of these adversarial examples helps to make the models more robust and give as an insight of the underlying decision making of these models. Over the years, researchers have successfully attacked image classifiers in, both, white and black-box setting. Although, these methods are not directly applicable to texts as text data is discrete in nature. In recent years, research on crafting adversarial examples against textual applications has been on the rise. In this paper, we present a novel approach for hard label black-box attacks against Natural Language Processing (NLP) classifiers, where no model information is disclosed, and an attacker can only query the model to get final decision of the classifier, without confidence scores of the classes involved. Such attack scenario is applicable to real world black-box models being used for security-sensitive applications such as sentiment analysis and toxic content detection

* 8 pages, 11 tables 

  Access Paper or Ask Questions

Generating Diverse Story Continuations with Controllable Semantics

Sep 30, 2019
Lifu Tu, Xiaoan Ding, Dong Yu, Kevin Gimpel

We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs. We focus on the setting of generating the next sentence of a story given its context. As controllable dimensions, we consider several sentence attributes, including sentiment, length, predicates, frames, and automatically-induced clusters. Our empirical results demonstrate: (1) our framework is accurate in terms of generating outputs that match the target control values; (2) our model yields increased maximum metric scores compared to standard n-best list generation via beam search; (3) controlling generation with semantic frames leads to a stronger combination of diversity and quality than other control variables as measured by automatic metrics. We also conduct a human evaluation to assess the utility of providing multiple suggestions for creative writing, demonstrating promising results for the potential of controllable, diverse generation in a collaborative writing system.

* EMNLP 2019 Workshop on Neural Generation and Translation (WNGT2019), and non-archival acceptance in NeuralGen 2019 

  Access Paper or Ask Questions

Language Model Pre-training for Hierarchical Document Representations

Jan 26, 2019
Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin

Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis. However, effective usage of such a large context can be difficult to learn, especially in the case where there is limited labeled data available. Building on the recent success of language model pretraining methods for learning flat representations of text, we propose algorithms for pre-training hierarchical document representations from unlabeled data. Unlike prior work, which has focused on pre-training contextual token representations or context-independent {sentence/paragraph} representations, our hierarchical document representations include fixed-length sentence/paragraph representations which integrate contextual information from the entire documents. Experiments on document segmentation, document-level question answering, and extractive document summarization demonstrate the effectiveness of the proposed pre-training algorithms.

  Access Paper or Ask Questions

Unsupervised Aspect Term Extraction with B-LSTM & CRF using Automatically Labelled Datasets

Sep 15, 2017
Athanasios Giannakopoulos, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl

Aspect Term Extraction (ATE) identifies opinionated aspect terms in texts and is one of the tasks in the SemEval Aspect Based Sentiment Analysis (ABSA) contest. The small amount of available datasets for supervised ATE and the costly human annotation for aspect term labelling give rise to the need for unsupervised ATE. In this paper, we introduce an architecture that achieves top-ranking performance for supervised ATE. Moreover, it can be used efficiently as feature extractor and classifier for unsupervised ATE. Our second contribution is a method to automatically construct datasets for ATE. We train a classifier on our automatically labelled datasets and evaluate it on the human annotated SemEval ABSA test sets. Compared to a strong rule-based baseline, we obtain a dramatically higher F-score and attain precision values above 80%. Our unsupervised method beats the supervised ABSA baseline from SemEval, while preserving high precision scores.

* 9 pages, 3 figures, 2 tables 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA), EMNLP 2017 

  Access Paper or Ask Questions

Joint Named Entity Recognition and Stance Detection in Tweets

Jul 30, 2017
Dilek Küçük

Named entity recognition (NER) is a well-established task of information extraction which has been studied for decades. More recently, studies reporting NER experiments on social media texts have emerged. On the other hand, stance detection is a considerably new research topic usually considered within the scope of sentiment analysis. Stance detection studies are mostly applied to texts of online debates where the stance of the text owner for a particular target, either explicitly or implicitly mentioned in text, is explored. In this study, we investigate the possible contribution of named entities to the stance detection task in tweets. We report the evaluation results of NER experiments as well as that of the subsequent stance detection experiments using named entities, on a publicly-available stance-annotated data set of tweets. Our results indicate that named entities obtained with a high-performance NER system can contribute to stance detection performance on tweets.

* 5 pages 

  Access Paper or Ask Questions

Supervised Fine Tuning for Word Embedding with Integrated Knowledge

May 29, 2015
Xuefeng Yang, Kezhi Mao

Learning vector representation for words is an important research field which may benefit many natural language processing tasks. Two limitations exist in nearly all available models, which are the bias caused by the context definition and the lack of knowledge utilization. They are difficult to tackle because these algorithms are essentially unsupervised learning approaches. Inspired by deep learning, the authors propose a supervised framework for learning vector representation of words to provide additional supervised fine tuning after unsupervised learning. The framework is knowledge rich approacher and compatible with any numerical vectors word representation. The authors perform both intrinsic evaluation like attributional and relational similarity prediction and extrinsic evaluations like the sentence completion and sentiment analysis. Experiments results on 6 embeddings and 4 tasks with 10 datasets show that the proposed fine tuning framework may significantly improve the quality of the vector representation of words.

  Access Paper or Ask Questions

Detection of Dangerous Events on Social Media: A Perspective Review

Apr 04, 2022
M. Luqman Jamil, Sebastião Pais, João Cordeiro

Social media is an essential gateway of information and communication for people worldwide. The amount of time spent and reliance of people on social media makes it a vital resource for detecting events happening in real life. Thousands of significant events are posted by users every hour in the form of multimedia. Some individuals and groups target the audience to promote their agenda among these users. Their cause can threaten other groups and individuals who do not share the same views or have specific differences. Any group with a definitive cause cannot survive without the support which acts as a catalyst for their agenda. A phenomenon occurs where people are fed information that motivates them to act on their behalf and carry out their agenda. One is benefit results in the loss of the others by putting their lives, assets, physical and emotional health in danger. This paper introduces a concept of dangerous events to approach this problem and their three main types based on their characteristics: action, scenarios, and sentiment-based dangerous events.

  Access Paper or Ask Questions

SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining

Oct 20, 2021
Gerhard Hagerer, Martin Kirchhoff, Hannah Danner, Robert Pesch, Mainak Ghosh, Archishman Roy, Jiaxi Zhao, Georg Groh

Recent research in opinion mining proposed word embedding-based topic modeling methods that provide superior coherence compared to traditional topic modeling. In this paper, we demonstrate how these methods can be used to display correlated topic models on social media texts using SocialVisTUM, our proposed interactive visualization toolkit. It displays a graph with topics as nodes and their correlations as edges. Further details are displayed interactively to support the exploration of large text collections, e.g., representative words and sentences of topics, topic and sentiment distributions, hierarchical topic clustering, and customizable, predefined topic labels. The toolkit optimizes automatically on custom data for optimal coherence. We show a working instance of the toolkit on data crawled from English social media discussions about organic food consumption. The visualization confirms findings of a qualitative consumer research study. SocialVisTUM and its training procedures are accessible online.

* RANLP-2021 
* Demo paper accepted for publication on RANLP 2021; 8 pages, 5 figures, 1 table 

  Access Paper or Ask Questions

Nora: The Well-Being Coach

Jun 01, 2021
Genta Indra Winata, Holy Lovenia, Etsuko Ishii, Farhad Bin Siddique, Yongsheng Yang, Pascale Fung

The current pandemic has forced people globally to remain in isolation and practice social distancing, which creates the need for a system to combat the resulting loneliness and negative emotions. In this paper we propose Nora, a virtual coaching platform designed to utilize natural language understanding in its dialogue system and suggest other recommendations based on user interactions. It is intended to provide assistance and companionship to people undergoing self-quarantine or work-from-home routines. Nora helps users gauge their well-being by detecting and recording the user's emotion, sentiment, and stress. Nora also recommends various workout, meditation, or yoga exercises to users in support of developing a healthy daily routine. In addition, we provide a social community inside Nora, where users can connect and share their experiences with others undergoing a similar isolation procedure. Nora can be accessed from anywhere via a web link and has support for both English and Mandarin.

* 7 pages 

  Access Paper or Ask Questions

DBATES: DataBase of Audio features, Text, and visual Expressions in competitive debate Speeches

Mar 26, 2021
Taylan K. Sen, Gazi Naven, Luke Gerstner, Daryl Bagley, Raiyan Abdul Baten, Wasifur Rahman, Kamrul Hasan, Kurtis G. Haut, Abdullah Mamun, Samiha Samrose, Anne Solbu, R. Eric Barnes, Mark G. Frank, Ehsan Hoque

In this work, we present a database of multimodal communication features extracted from debate speeches in the 2019 North American Universities Debate Championships (NAUDC). Feature sets were extracted from the visual (facial expression, gaze, and head pose), audio (PRAAT), and textual (word sentiment and linguistic category) modalities of raw video recordings of competitive collegiate debaters (N=717 6-minute recordings from 140 unique debaters). Each speech has an associated competition debate score (range: 67-96) from expert judges as well as competitor demographic and per-round reflection surveys. We observe the fully multimodal model performs best in comparison to models trained on various compositions of modalities. We also find that the weights of some features (such as the expression of joy and the use of the word we) change in direction between the aforementioned models. We use these results to highlight the value of a multimodal dataset for studying competitive, collegiate debate.

* 12 pages, 5 figures, 4 tables, under-going major revision for TAC 

  Access Paper or Ask Questions