Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Distilling neural networks into skipgram-level decision lists

May 18, 2020
Madhumita Sushil, Simon Šuster, Walter Daelemans

Several previous studies on explanation for recurrent neural networks focus on approaches that find the most important input segments for a network as its explanations. In that case, the manner in which these input segments combine with each other to form an explanatory pattern remains unknown. To overcome this, some previous work tries to find patterns (called rules) in the data that explain neural outputs. However, their explanations are often insensitive to model parameters, which limits the scalability of text explanations. To overcome these limitations, we propose a pipeline to explain RNNs by means of decision lists (also called rules) over skipgrams. For evaluation of explanations, we create a synthetic sepsis-identification dataset, as well as apply our technique on additional clinical and sentiment analysis datasets. We find that our technique persistently achieves high explanation fidelity and qualitatively interpretable rules.

  Access Paper or Ask Questions

Pruning and Sparsemax Methods for Hierarchical Attention Networks

Apr 08, 2020
João G. Ribeiro, Frederico S. Felisberto, Isabel C. Neto

This paper introduces and evaluates two novel Hierarchical Attention Network models [Yang et al., 2016] - i) Hierarchical Pruned Attention Networks, which remove the irrelevant words and sentences from the classification process in order to reduce potential noise in the document classification accuracy and ii) Hierarchical Sparsemax Attention Networks, which replace the Softmax function used in the attention mechanism with the Sparsemax [Martins and Astudillo, 2016], capable of better handling importance distributions where a lot of words or sentences have very low probabilities. Our empirical evaluation on the IMDB Review for sentiment analysis datasets shows both approaches to be able to match the results obtained by the current state-of-the-art (without, however, any significant benefits). All our source code is made available at

  Access Paper or Ask Questions

Robustness Verification for Transformers

Feb 16, 2020
Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh

Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding model behavior and obtaining safety guarantees. However, previous methods can usually only handle neural networks with relatively simple architectures. In this paper, we consider the robustness verification problem for Transformers. Transformers have complex self-attention layers that pose many challenges for verification, including cross-nonlinearity and cross-position dependency, which have not been discussed in previous works. We resolve these challenges and develop the first robustness verification algorithm for Transformers. The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound Propagation. These bounds also shed light on interpreting Transformers as they consistently reflect the importance of different words in sentiment analysis.

* ICLR 2020 

  Access Paper or Ask Questions

FastWordBug: A Fast Method To Generate Adversarial Text Against NLP Applications

Jan 31, 2020
Dou Goodman, Lv Zhonghou, Wang minghua

In this paper, we present a novel algorithm, FastWordBug, to efficiently generate small text perturbations in a black-box setting that forces a sentiment analysis or text classification mode to make an incorrect prediction. By combining the part of speech attributes of words, we propose a scoring method that can quickly identify important words that affect text classification. We evaluate FastWordBug on three real-world text datasets and two state-of-the-art machine learning models under black-box setting. The results show that our method can significantly reduce the accuracy of the model, and at the same time, we can call the model as little as possible, with the highest attack efficiency. We also attack two popular real-world cloud services of NLP, and the results show that our method works as well.

  Access Paper or Ask Questions

Single Training Dimension Selection for Word Embedding with PCA

Aug 30, 2019
Yu Wang

In this paper, we present a fast and reliable method based on PCA to select the number of dimensions for word embeddings. First, we train one embedding with a generous upper bound (e.g. 1,000) of dimensions. Then we transform the embeddings using PCA and incrementally remove the lesser dimensions one at a time while recording the embeddings' performance on language tasks. Lastly, we select the number of dimensions while balancing model size and accuracy. Experiments using various datasets and language tasks demonstrate that we are able to train 10 times fewer sets of embeddings while retaining optimal performance. Researchers interested in training the best-performing embeddings for downstream tasks, such as sentiment analysis, question answering and hypernym extraction, as well as those interested in embedding compression should find the method helpful.

* 6 pages, 5 figures, 2 tables, EMNLP 2019 

  Access Paper or Ask Questions

Deep contextualized word representations

Mar 22, 2018
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

* NAACL 2018. Originally posted to openreview 27 Oct 2017. v2 updated for NAACL camera ready 

  Access Paper or Ask Questions

Exponential Machines

Dec 08, 2017
Alexander Novikov, Mikhail Trofimov, Ivan Oseledets

Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The Tensor Train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with 2^160 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100K.

* ICLR-2017 workshop track paper 

  Access Paper or Ask Questions

Impact of Feature Selection on Micro-Text Classification

Aug 27, 2017
Ankit Vadehra, Maura R. Grossman, Gordon V. Cormack

Social media datasets, especially Twitter tweets, are popular in the field of text classification. Tweets are a valuable source of micro-text (sometimes referred to as "micro-blogs"), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others. Tweets often include keywords referred to as "Hashtags" that can be used as labels for the tweet. Using tweets encompassing 50 labels, we studied the impact of word versus character-level feature selection and extraction on different learners to solve a multi-class classification task. We show that feature extraction of simple character-level groups performs better than simple word groups and pre-processing methods like normalizing using Porter's Stemming and Part-of-Speech ("POS")-Lemmatization.

* 4 pages, 6 figures 

  Access Paper or Ask Questions

Automatic Identification of Sarcasm Target: An Introductory Approach

Aug 25, 2017
Aditya Joshi, Pranav Goel, Pushpak Bhattacharyya, Mark Carman

Past work in computational sarcasm deals primarily with sarcasm detection. In this paper, we introduce a novel, related problem: sarcasm target identification i.e., extracting the target of ridicule in a sarcastic sentence). We present an introductory approach for sarcasm target identification. Our approach employs two types of extractors: one based on rules, and another consisting of a statistical classifier. To compare our approach, we use two baselines: a na\"ive baseline and another baseline based on work in sentiment target identification. We perform our experiments on book snippets and tweets, and show that our hybrid approach performs better than the two baselines and also, in comparison with using the two extractors individually. Our introductory approach establishes the viability of sarcasm target identification, and will serve as a baseline for future work.

  Access Paper or Ask Questions